Gene expression profiles and methods of use

ABSTRACT

The present invention relates to gene expression profiles for lung cancer, microarrays comprising nucleic acid sequences representing gene expression profiles, and methods of using expression profiles and microarrays. The invention also provides methods and compositions for diagnostic assays for detecting cancer and therapeutic methods and compositions for treating cancer. The invention also provides methods for designing, identifying, and optimizing therapeutics for cancer.

This application claims benefit of U.S. Provisional Application Ser. No. 60/508,355, filed Oct. 3, 2003, the contents of which are incorporated herein by reference in their entirety.

FIELD OF THE INVENTION

The present invention relates to gene expression profiles, microarrays comprising nucleic acid sequences representing gene expression profiles, and methods of using gene expression profiles and microarrays.

BACKGROUND OF THE INVENTION

Many disease states are characterized by differences in the expression levels of various genes either through changes in the copy number of the genetic DNA or through changes in levels of transcription of particular genes (e.g., through control of initiation, provision of RNA precursors, RNA processing, etc.). For example, losses and gains of genetic material play an important role in malignant transformation and progression. These gains and losses are thought to be “driven” by at least two kinds of genes, oncogenes and tumor suppressor genes. Oncogenes are positive regulators of tumorgenesis, while tumor suppressor genes are negative regulators of tumorgenesis (Marshall, Cell 64:313-326, 1991; Weinberg, Science 254:1138-1146, 1991). Therefore, one mechanism of activating unregulated growth is to increase the number of genes coding for oncogene proteins or to increase the level of expression of these oncogenes (e.g., in response to cellular or environmental changes), and another mechanism is to lose genetic material or to decrease the level of expression of genes that code for tumor suppressors. This model is supported by the losses and gains of genetic material associated with glioma progression (Mikkelson, et al., J. Cellular Biochem. 46:3-8, 1991). Thus, changes in the expression (transcription) levels of particular genes (e.g., oncogenes or tumor suppressors) serve as signposts for the presence and progression of various cancers.

Compounds which are used as therapeutics to treat these various diseases (e.g., cancer) presumably reverse some, or all, of these gene expression changes. The expression change of at least some of these genes may therefore, be used as a method to monitor, or even predict, the efficacy of such therapeutics. The analysis of these expression changes may be performed in the target tissue of interest (e.g., tumor) or in some surrogate cell population (e.g., peripheral blood leukocytes). In the latter case, correlation of the gene expression changes with efficacy (e.g., tumor shrinkage or non-growth) must be especially strong for the expression change pattern to be used as a marker for efficacy.

A number of laboratories have reported success in using gene expression analysis, via microarrays or other methods, to classify human tumors at the molecular level (Bittner, et al., Nature 406:536-540, 2000; Alon, et al., Proc. Natl. Acad. Sci. USA 96:6745-6750, 1999; Alizadeh, et al., Nature 403:503-511, 2000; Golub, et al., Science 286:531-537, 1999; Perou, et al., Proc. Natl. Acad. Sci. 96:9212-9217, 1999; Kahn, et al., Am. J. Pathol. 156:1887-1900, 2000). Genes, either individually or as a subset, identified in this way may be used as markers that could be tracked for changes that correlate with efficacy of a therapeutic compound(s) or to predict which patients might benefit from a particular therapeutic. Total RNA was isolated from ten human lung tumors and from normal adjacent tissue (NAT), and the RNA was analyzed from each sample using Affymetrix technology.

SUMMARY OF THE INVENTION

The present invention is directed to gene expression profiles for lung cancer, microarrays comprising nucleic acid sequences representing said gene expression profiles, and methods of using said gene expression profiles and microarrays.

In one embodiment of the present invention, the gene expression profile is an expression profile comprising one or more genes (e.g., SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399) that demonstrate altered expression in human lung tumors versus normal adjacent tissue (NAT).

In another embodiment, the expression profile is an expression profile comprising one or more polypeptides (e.g., SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 369, 398, 400) that demonstrate altered expression in human lung tumors versus normal adjacent tissue (NAT).

In another embodiment of the present invention, the gene expression profile may be an expression profile comprising one or more genes selected from the group consisting of the genes listed in Table 1 to 3. In a further embodiment of the present invention, the gene expression profiles comprise one or more biomarkers isolated from the group comprising the genes listed in the Tables.

The present invention is also directed to the discovery of the gene expression profile of human lung tumors and normal adjacent tissue. As described in the Examples and in the Tables, human lung tumors have genes which are expressed at higher levels (i.e., which are up-regulated) and genes which are expressed at lower levels (i.e., which are down-regulated) relative to normal adjacent tissue. Sets of genes which are up-regulated or down-regulated are referred to herein as “genes characteristic of human lung tumor tissue.”

Also within the scope of the present invention are microarrays comprising one or more genes that demonstrate altered expression in human lung tumor tissue. In another embodiment of the present invention, the microarray may be a microarray comprising one or more genes selected from the group consisting of the genes listed in the Tables. In a further embodiment, the microarray may-be a microarray comprising one or more biomarkers isolated from the group comprising the genes listed in the Tables.

In addition, it is an objective of the invention to provide methods and reagents for the prediction, diagnosis, prognosis, and therapy of cancer.

This invention also relates to methods for using said microarrays which include, but are not limited to, screening the effects of a drug or treatment on tissue or cell samples, screening toxicity effects on tissue or cell samples, identifying a disease state in a tissue or cell sample, providing a patient diagnosis, predicting a patient's response to treatment, distinguishing between control and drug-treated samples, distinguishing between normal and tumor samples, discovering novel drugs, and determining the level of gene expression in a tissue or cell sample.

Another embodiment of the present invention is a method for screening the effects of a drug on a tissue or cell sample comprising the step of analyzing the level of expression of one or more genes (e.g., SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399) and/or gene products (e.g., SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 369, 398, 400), wherein the gene expression and/or gene product levels in the tissue or cell sample are analyzed before and after exposure to the drug, and a variation in the expression level of the gene and/or gene product is indicative of a drug effect or provides a patient diagnosis or predicts a patient's response to the treatment.

Another aspect of the present invention is a method for discovering novel drugs comprising the step of analyzing the level of expression of one or more genes and/or gene products, wherein the gene expression and/or gene product levels of the cells are analyzed before and after exposure to the drug, and a variation in the expression level of the gene and/or gene product is indicative of drug efficacy.

The invention further provides a method for identifying a compound useful for the treatment of cancer comprising administering to a subject with cancer a test compound, and measuring the activity of the polypeptide (e.g., the polypeptides encoded by SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 369, 398, 400), wherein a change in the activity of the polypeptide is indicative of the test compound being useful for the treatment of cancer.

The invention, thus, provides methods which may be used to identify compounds which may act, for example, as regulators or modulators such as agonists and antagonists, partial agonists, inverse agonists, activators, co-activators, and inhibitors. Accordingly, the invention provides reagents and methods for regulating the expression of a polynucleotide or a polypeptide associated with cancer. Reagents that modulate the expression, stability, or amount of a polynucleotide or the activity of the polypeptide may be a protein, a peptide, a peptidomimetic, a nucleic acid, a nucleic acid analogue (e.g., peptide nucleic acid, locked nucleic acid), or a small molecule.

The present invention also provides a method for providing a patient diagnosis comprising the step of analyzing the level of expression of one or more genes and/or gene products, wherein the gene expression and/or gene product levels of normal and patient samples are analyzed, and a variation in the expression level of the gene and/or gene product in the patient sample is diagnostic of a disease. The patient samples include, but are not limited to, blood, amniotic fluid, plasma, semen, bone marrow, and tissue biopsy.

The present invention still further provides a method of diagnosing cancer in a subject comprising measuring the activity of the polypeptide in a subject suspected of having cancer, wherein if there is a difference in the activity of the polypeptide, relative to the activity of the polypeptide in a subject not suspected of having cancer, then the subject is diagnosed has having cancer.

In another embodiment, the invention provides a method for detecting cancer in a patient sample in which an antibody to a protein is used to react with proteins in the patient sample.

Another aspect of the present invention is a method for distinguishing between normal and disease states comprising the step of analyzing the level of expression of one or more genes and/or gene products, wherein the gene expression and/or gene product levels of normal and disease tissues are analyzed, and a variation in the expression level of the gene and/or gene product is indicative of a disease state.

In another embodiment, the invention pertains to a method of determining the phenotype of cells comprising detecting the differential expression, relative to normal cells, of at least one gene, wherein the gene is differentially expressed by at least a factor of two, at least a factor of five, at least a factor of twenty, or at least a factor of fifty.

In yet another embodiment, the invention pertains to a method of determining the phenotype of cells, comprising detecting the differential expression, relative to normal cells, of at least one polypeptide, wherein the protein is differentially expressed by at least a factor of two, at least a factor of five, at least a factor of twenty, an up to at least a factor of fifty.

In another embodiment, the invention pertains to a method for determining the phenotype of cells from a patient by providing a nucleic acid probe comprising a nucleotide sequence having at least about 10, at least about 15, at least about 25, or at least about 40 consecutive nucleotides, obtaining a sample of cells from a patient, optionally providing a second sample of cells substantially all of which are non-cancerous, contacting the nucleic acid probe under stringent conditions with mRNA of each of said first and second cell samples, and comparing (a) the amount of hybridization of the probe with mRNA of the first cell sample, with (b) the amount of hybridization of the probe with mRNA of the second cell sample, wherein a difference of at least a factor of two, at least a factor of five, at least a factor of twenty, or at least a factor of fifty in the amount of hybridization with the mRNA of the first cell sample as compared to the amount of hybridization with the mRNA of the second cell sample is indicative of the phenotype of cells in the first cell sample.

In another embodiment, the invention provides a test kit for identifying the presence of cancerous cells or tissues, comprising a probe/primer, for measuring a level of a nucleic acid in a sample of cells isolated from a patient. In certain embodiments, the kit may further include instructions for using the kit, solutions for suspending or fixing the cells, detectable tags or labels, solutions for rendering a nucleic acid susceptible to hybridization, solutions for lysing cells, or solutions for the purification of nucleic acids.

In one embodiment, the invention provides a test kit for identifying the presence of cancer cells or tissues, comprising an antibody specific for a protein. In certain embodiments, the kit further includes instructions for using the kit. In certain embodiments, the kit may further include solutions for suspending or fixing the cells, detectable tags or labels, solutions for rendering a polypeptide susceptible to the binding of an antibody, solutions for lysing cells, or solutions for the purification of polypeptides.

In another embodiment, the invention provides a test kit for monitoring the efficacy of a compound or therapeutic in cancerous cells or tissues, comprising a probe/primer, for measuring a level of a nucleic acid in a sample of cells isolated from a patient. In certain embodiments, the kit may further include instructions for using the kit, solutions for suspending or fixing the cells, detectable tags or labels, solutions for rendering a nucleic acid susceptible to hybridization, solutions for lysing cells, or solutions for the purification of nucleic acids.

In one embodiment, the invention provides a test kit for monitoring the efficacy of a compound or therapeutic in cancer cells or tissues, comprising an antibody specific for a protein. In certain embodiments, the kit further includes instructions for using the kit. In certain embodiments, the kit may further include solutions for suspending or fixing the cells, detectable tags or labels, solutions for rendering a polypeptide susceptible to the binding of an antibody, solutions for lysing cells, or solutions for the purification of polypeptides.

This invention is also related to methods of identifying biomarkers comprising the steps of selecting a set of biomarker genes from a gene expression profile representing a disease or drug treatment.

DETAILED DESCRIPTION OF THE INVENTION

It is to be understood that this invention is not limited to the particular methodology, protocols, cell lines, animal species or genera, constructs, and reagents described and as such may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention which will be limited only by the appended claims.

It must be noted that as used herein and in the appended claims, the singular forms “a,” “and,” and “the” include plural reference unless the context clearly dictates otherwise. Thus, for example, reference to “a gene” is a reference to one or more genes and includes equivalents thereof known to those skilled in the art, and so forth.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this invention belongs. Although any methods, devices, and materials similar or equivalent to those described herein can be used in the practice or testing of the invention, the preferred methods, devices and materials are now described.

All publications and patents mentioned herein are hereby incorporated herein by reference for the purpose of describing and disclosing, for example, the constructs and methodologies that are described in the publications which might be used in connection with the presently described invention. The publications discussed above and throughout the text are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior invention.

DEFINITIONS

For convenience, the meaning of certain terms and phrases employed in the specification, examples, and appended claims are provided below.

The phrase “a corresponding normal cell of” or “normal cell corresponding to” or “normal counterpart cell of” a diseased cell refers to a normal cell of the same type as that of the diseased cell.

An “address” on an array (e.g., a microarray) refers to a location at which an element, for example, an oligonucleotide, is attached to the solid surface of the array.

The term “agonist,” as used herein, is meant to refer to an agent that mimics or up-regulates (e.g., potentiates or supplements) the bioactivity of a protein. An agonist may be a wild-type protein or derivative thereof having at least one bioactivity of the wild-type protein. An agonist may also be a compound that up-regulates expression of a gene or which increases at least one bioactivity of a protein. An agonist can also be a compound which increases the interaction of a polypeptide with another molecule, for example, a target peptide or nucleic acid.

“Amplification,” as used herein, relates to the production of additional copies of a nucleic acid sequence. For example, amplification may be carried out using polymerase chain reaction (PCR) technologies which are well known in the art. (see, e.g., Dieffenbach, C. W. and G. S. Dveksler (1995) PCR Primer, A Laboratory Manual, Cold Spring Harbor Press, Plainview, N.Y.)

“Antagonist,” as used herein, is meant to refer to an agent that down-regulates (e.g., suppresses or inhibits) at least one bioactivity of a protein. An antagonist may be a compound which inhibits or decreases the interaction between a protein and another molecule, for example, a target peptide or enzyme substrate. An antagonist may also be a compound that down-regulates expression of a gene or which reduces the amount of expressed protein present.

The term “antibody,” as used herein, is intended to include whole antibodies, for example, of any isotype (IgG, IgA, IgM, IgE, etc.), and includes fragments thereof which are also specifically reactive with a vertebrate (e.g., mammalian) protein. Antibodies may be fragmented using conventional techniques and the fragments screened for utility in the same manner as described above for whole antibodies. Thus, the term includes segments of proteolytically-cleaved or recombinantly-prepared portions of an antibody molecule that are capable of selectively reacting with a certain protein. Non-limiting examples of such proteolytic and/or recombinant fragments include Fab, F(ab′)2, Fab′, Fv, and single chain antibodies (scFv) containing a V[L] and/or V[H] domain joined by a peptide linker. The scFv's may be covalently or non-covalently linked to form antibodies having two or more binding sites. The subject invention includes polyclonal, monoclonal, or other purified preparations of antibodies and recombinant antibodies.

The terms “array” or “matrix” refer to an arrangement of addressable locations or “addresses” on a device. The locations can be arranged in two-dimensional arrays, three-dimensional arrays, or other matrix formats. The number of locations may range from several to at least hundreds of thousands. Most importantly, each location represents a totally independent reaction site. A “nucleic acid array” refers to an array containing nucleic acid probes, such as oligonucleotides or larger portions of genes. The nucleic acid on the array may be single-stranded. Arrays wherein the probes are oligonucleotides are referred to as “oligonucleotide arrays” or “oligonucleotide chips.” A “microarray,” also referred to herein as a “biochip” or “biological chip,” is an array of regions having a density of discrete regions of at least about 100/cm², or at least about 1000/cm². The regions in a microarray have typical dimensions, for example, diameters, in the range of between about 10-250 μm, and are separated from other regions in the array by about the same distance.

“Biological activity,” “bioactivity,” “activity,” or “biological function,” which are used interchangeably, herein mean an effector or antigenic function that is directly or indirectly performed by a polypeptide (whether in its native or denatured conformation), or by any subsequence thereof. Biological activities include binding to polypeptides, binding to other proteins or molecules, activity as a DNA binding protein, as a transcription regulator, ability to bind damaged DNA, etc. A bioactivity can be modulated by directly affecting the subject polypeptide. Alternatively, a bioactivity can be altered by modulating the level of the polypeptide, such as by modulating expression of the corresponding gene.

The term “biological sample,” as used herein, refers to a sample obtained from an organism or from components (e.g., cells) of an organism. The sample may be of any biological tissue or fluid. The sample may be a “clinical sample” which is a sample derived from a patient. Such samples include, but are not limited to, sputum, blood, blood cells (e.g., white cells), tissue or fine needle biopsy samples, urine, peritoneal fluid, and pleural fluid, or cells therefrom. Biological samples may also include sections of tissues such as frozen sections taken for histological purposes.

The term “biomarker” or “marker” encompasses a broad range of intra- and extra-cellular events as well as whole-organism physiological changes. Biomarkers may be represent essentially any aspect of cell function, for example, but not limited to, levels or rate of production of signaling molecules, transcription factors, metabolites, gene transcripts as well as post-translational modifications of proteins. Biomarkers may include whole genome analysis of transcript levels or whole proteome analysis of protein levels and/or modifications.

A biomarker may also refer to a gene or gene product which is up- or down-regulated in a compound-treated, diseased cell of a subject having the disease compared to an untreated diseased cell. That is, the gene or gene product is sufficiently specific to the treated cell that it may be used, optionally with other genes or gene products, to identify, predict, or detect efficacy of a small molecule. Thus, a biomarker is a gene or gene product that is characteristic of efficacy of a compound in a diseased cell or the response of that diseased cell to treatment by the compound.

A nucleotide sequence is “complementary” to another nucleotide sequence if each of the bases of the two sequences match, that is, are capable of forming Watson-Crick base pairs. The term “complementary strand” is used herein interchangeably with the term “complement.” The complement of a nucleic acid strand may be the complement of a coding strand or the complement of a non-coding strand.

“Detection agents of genes” refers to agents that can be used to specifically detect the gene or other biological molecules relating to it, for example, RNA transcribed from the gene or polypeptides encoded by the gene. Exemplary detection agents are nucleic acid probes, which hybridize to nucleic acids corresponding to the gene, and antibodies.

“Differential gene expression pattern” between cell A and cell B refers to a pattern reflecting the differences in gene expression between cell A and cell B. A differential gene expression pattern may also be obtained between a cell at one time point and a cell at another time point, or between a cell incubated or contacted with a compound and a cell that has not been incubated with or contacted with the compound.

The term “cancer” includes, but is not limited to, solid tumors, such as cancers of the breast, respiratory tract, brain, reproductive organs, digestive tract, urinary tract, eye, liver, skin, head and neck, thyroid, parathyroid, and their distant metastases. The term also includes lymphomas, sarcomas, and leukemias.

Examples of breast cancer include, but are not limited to, invasive ductal carcinoma, invasive lobular carcinoma, ductal carcinoma in situ, and lobular carcinoma in situ.

Examples of cancers of the respiratory tract include, but are not limited to, small-cell and non-small-cell lung carcinoma, as well as bronchial adenoma and pleuropulmonary blastoma.

Examples of brain cancers include, but are not limited to, brain stem and hypophtalmic glioma, cerebellar and cerebral astrocytoma, medulloblastoma, ependymoma, as well as neuroectodermal and pineal tumor.

Tumors of the male reproductive organs include, but are not limited to, prostate and testicular cancer. Tumors of the female reproductive organs include, but are not limited to, endometrial, cervical, ovarian, vaginal, and vulvar cancer, as well as sarcoma of the uterus.

Tumors of the digestive tract include, but are not limited to, anal, colon, colorectal, esophageal, gallbladder, gastric, pancreatic, rectal, small-intestine, and salivary gland cancers.

Tumors of the urinary tract include, but are not limited to, bladder, penile, kidney, renal pelvis, ureter, and urethral cancers.

Eye cancers include, but are not limited to, intraocular melanoma and retinoblastoma.

Examples of liver cancers include, but are not limited to, hepatocellular carcinoma (liver cell carcinomas with or without fibrolamellar variant), cholangiocarcinoma (intrahepatic bile duct carcinoma), and mixed hepatocellular cholangiocarcinoma.

Skin cancers include, but are not limited to, squamous cell carcinoma, Kaposi's sarcoma, malignant melanoma, Merkel cell skin cancer, and non-melanoma skin cancer.

Head-and-neck cancers include, but are not limited to, laryngeal/hypopharyngeal/nasopharyngeal/oropharyngeal cancer, and lip and oral cavity cancer.

Lymphomas include, but are not limited to, AIDS-related lymphoma, non-Hodgkin's lymphoma, cutaneous T-cell lymphoma, Hodgkin's disease, and lymphoma of the central nervous system.

Sarcomas include, but are not limited to, sarcoma of the soft tissue, osteosarcoma, malignant fibrous histiocytoma, lymphosarcoma, and rhabdomyosarcoma.

Leukemias include, but are not limited to, acute myeloid leukemia, acute lymphoblastic leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, and hairy cell leukemia.

“A diseased cell of cancer” refers to a cell present in subjects having cancer. That is, a cell which is a modified form of a normal cell and is not present in a subject not having cancer, or a cell which is present in significantly higher or lower numbers in subjects having cancer relative to subjects not having cancer.

The term “equivalent” is understood to include nucleotide sequences encoding functionally equivalent polypeptides. Equivalent nucleotide sequences may include sequences that differ by one or more nucleotide substitutions, additions, or deletions, such as allelic variants; and may, therefore, include sequences that differ from the nucleotide sequence of the nucleic acids referred to in the Tables due to the degeneracy of the genetic code.

The term “expression profile,” which is used interchangeably herein with “gene expression profile” and “fingerprint” of a cell refers to a set of values representing mRNA levels of one or more genes in a cell. An expression profile may comprise values representing expression levels of at least about 10 genes, or at least about 50, 100, 200 or more genes. Expression profiles may also comprise an mRNA level of a gene which is expressed at similar levels in multiple cells and conditions (e.g., a housekeeping gene such as GAPDH). For example, an expression profile of a diseased cell of cancer refers to a set of values representing mRNA levels of 10 or more genes in a diseased cell.

The term “gene” refers to a nucleic acid sequence that comprises control and coding sequences necessary for the production of a polypeptide or precursor. The polypeptide can be encoded by a full length coding sequence or by any portion of the coding sequence. The gene may be derived in whole or in part from any source known to the art, including a plant, a fungus, an animal, a bacterial genome or episome, eukaryotic, nuclear or plasmid DNA, cDNA, viral DNA, or chemically synthesized DNA. A gene may contain one or more modifications in either the coding or the untranslated regions which could affect the biological activity or the chemical structure of the expression product, the rate of expression, or the manner of expression control. Such modifications include, but are not limited to, mutations, insertions, deletions, and substitutions of one or more nucleotides. The gene may constitute an uninterrupted coding sequence or it may include one or more introns, bound by the appropriate splice junctions.

“Hybridization” refers to any process by which a strand of nucleic acid binds with a complementary strand through base pairing. For example, two single-stranded nucleic acids “hybridize” when they form a double-stranded duplex. The region of double-strandedness may include the full-length of one or both of the single-stranded nucleic acids, or all of one single-stranded nucleic acid and a subsequence of the other single-stranded nucleic acid, or the region of double-strandedness may include a subsequence of each nucleic acid. Hybridization also includes the formation of duplexes which contain certain mismatches, provided that the two strands are still forming a double-stranded helix. “Stringent hybridization conditions” refers to hybridization conditions resulting in essentially specific hybridization.

The term “isolated,” as used herein, with respect to nucleic acids, such as DNA or RNA, refers to molecules separated from other DNAs or RNAs, respectively, that are present in the natural source of the macromolecule. The term “isolated” as used herein also refers to a nucleic acid or peptide that is substantially free of cellular material, viral material, culture medium when produced by recombinant DNA techniques, or chemical precursors or other chemicals when chemically synthesized. Moreover, an “isolated nucleic acid” may include nucleic acid fragments which are not naturally occurring as fragments and would not be found in the natural state. The term “isolated” is also used herein to refer to polypeptides which are isolated from other cellular proteins and is meant to encompass both purified and recombinant polypeptides.

As used herein, the terms “label” and “detectable label” refer to a molecule capable of detection, including, but not limited to, radioactive isotopes, fluorophores, chemiluminescent moieties, enzymes, enzyme substrates, enzyme cofactors, enzyme inhibitors, dyes, metal ions, ligands (e.g., biotin or haptens), and the like. The term “fluorescer” refers to a substance or a portion thereof which is capable of exhibiting fluorescence in the detectable range. Particular examples of labels which may be used in the present invention include fluorescein, rhodamine, dansyl, umbelliferone, Texas red, luminol, NADPH, alpha-beta-galactosidase, and horseradish peroxidase.

The phrase “level of expression” refers to the level of mRNA, as well as pre-mRNA nascent transcript(s), transcript processing intermediates, mature mRNA(s), and degradation products, encoded by a gene in the cell. The phrase “level of expression” also refers to the level of protein or polypeptide in a cell.

As used herein, the term “nucleic acid” refers to polynucleotides such as deoxyribonucleic acid (DNA) and, where appropriate, ribonucleic acid (RNA). The term should also be understood to include, as equivalents, analogs of either RNA or DNA made from nucleotide analogs and, as applicable to the embodiment being described, single-stranded (sense or antisense) and double-stranded polynucleotides. Chromosomes, cDNAs, mRNAs, rRNAs, and ESTs are representative examples of molecules that may be referred to as nucleic acids.

The phrase “nucleic acid corresponding to a gene” refers to a nucleic acid that can be used for detecting the gene, for example, a nucleic acid which is capable of hybridizing specifically to the gene.

The phrase “nucleic acid sample derived from RNA” refers to one or more nucleic acid molecules (e.g., RNA or DNA) that may be synthesized from the RNA, and includes DNA produced from methods using PCR (e.g., RT-PCR).

The term “oligonucleotide” as used herein refers to a nucleic acid molecule comprising, for example, from about 10 to about 1000 nucleotides. Oligonucleotides for use in the present invention may be from about 15 to about 150 nucleotides, or from about 150 to about 1000 in length. The oligonucleotide may be a naturally occurring oligonucleotide or a synthetic oligonucleotide. Oligonucleotides may be prepared by the phosphoramidite method (Beaucage and Carruthers, Tetrahedron Lett. 22:1859-62, 1981), or by the triester method (Matteucci, et al., J. Am. Chem. Soc. 103:3185, 1981), or by other chemical methods known in the art.

The term “patient” or “subject” as used herein includes mammals (e.g., humans and animals).

The term “percent identical” refers to sequence identity between two amino acid sequences or between two nucleotide sequences. For example, identity between two sequences may be determined by comparing a particular position in each sequence which may be aligned for purposes of comparison. When an equivalent position in the compared sequences is occupied by the same base or amino acid, then the molecules are identical at that position. When the equivalent site is occupied by the same or a similar amino acid residue (e.g., similar in steric and/or electronic nature), then the molecules may be referred to as homologous (similar) at that position. Expression as a percentage of homology, similarity, or identity refers to a function of the number of identical or similar amino acids at positions shared by the compared sequences. Various alignment algorithms and/or programs may be used including, for example, FASTA, BLAST, or ENTREZ. FASTA and BLAST are available as a part of the GCG sequence analysis package (University of Wisconsin, Madison, Wis.), and may be used with, for example, default settings. ENTREZ is available through the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Md. In one embodiment, the percent identity of two sequences may be determined by the GCG program with a gap weight of 1 (e.g., each amino acid gap is weighted as if it were a single amino acid or nucleotide mismatch between the two sequences). Other techniques for alignment are described in Methods in Enzymology (vol. 266: Computer Methods for Macromolecular Sequence Analysis (1996), ed. Doolittle, Academic Press, Inc., a division of Harcourt Brace & Co., San Diego, Calif., USA). An alignment program that permits gaps in the sequence may be utilized to align the sequences. For example, the Smith-Waterman is one type of algorithm that permits gaps in sequence alignments (see, e.g., Meth. Mol. Biol. 70:173-187, 1997). Also, the GAP program using the Needleman and Wunsch alignment method may be utilized to align sequences. An alternative search strategy uses MPSRCH software, which runs on a MASPAR computer. MPSRCH uses a Smith-Waterman algorithm to score sequences on a massively parallel computer. This approach improves the ability to detect distantly related matches, and is especially tolerant of small gaps and nucleotide sequence errors. Nucleic acid-encoded amino acid sequences may be used to search both protein and DNA databases. Databases with individual sequences are described in Methods in Enzymology, ed. Doolittle, supra. Databases include, for example, Genbank, EMBL, and DNA Database of Japan (DDBJ).

As used herein, a nucleic acid or other molecule attached to an array is referred to as a “probe” or “capture probe.” When an array contains several probes corresponding to one gene, these probes are referred to as a “gene-probe set.” A gene-probe set may consist of, for example, about 2 to about 20 probes, from about 2 to about 10 probes, or about 5 probes.

The “profile” of a cell's biological state refers to the levels of various constituents of a cell that are known to change in response to drug treatments and other perturbations of the biological state of the cell. Constituents of a cell include, for example, levels of RNA, levels of protein abundances, or protein activity levels.

The term “protein,” “polypeptide,” and “peptide” are used interchangeably herein when referring to a gene product.

An expression profile in one cell is “similar” to an expression profile in another cell when the level of expression of the genes in the two profiles are sufficiently similar that the similarity is indicative of a common characteristic, for example, the same type of cell. Accordingly, the expression profiles of a first cell and a second cell are similar when at least 75% of the genes that are expressed in the first cell are expressed in the second cell at a level that is within a factor of two relative to the first cell.

“Small molecule,” as used herein, refers to a composition with a molecular weight of less than about 5 kD or less than about 4 kD. Small molecules can be nucleic acids, peptides, polypeptides, peptidomimetics, carbohydrates, lipids, or other organic or inorganic molecules. Many pharmaceutical companies have extensive libraries of chemical and/or biological mixtures, often fungal, bacterial, or algal extracts, which can be screened with any of the assays of the invention to identify compounds that modulate a bioactivity.

The term “specific hybridization” of a probe to a target site of a template nucleic acid refers to hybridization of the probe predominantly to the target, such that the hybridization signal can be clearly interpreted. As further described herein, such conditions resulting in specific hybridization vary depending on the length of the region of homology, the GC content of the region, and the melting temperature (“Tm”) of the hybrid. Thus, hybridization conditions may vary in salt content, acidity, and temperature of the hybridization solution and the washes.

A “variant” of polypeptide refers to a polypeptide having an amino acid sequence in which one or more amino acid residues is altered. The variant may have “conservative” changes, wherein a substituted amino acid has similar structural or chemical properties (e.g., replacement of leucine with isoleucine). A variant may also have “nonconservative” changes (e.g., replacement of glycine with tryptophan). Analogous minor variations may include amino acid deletions or insertions, or both. Guidance in determining which amino acid residues may be substituted, inserted, or deleted without abolishing biological or immunological activity may be identified using computer programs well known in the art, for example, LASERGENE software (DNASTAR).

The term “variant,” when used in the context of a polynucleotide sequence, may encompass a polynucleotide sequence related to that of a particular gene or the coding sequence thereof. This definition may also include, for example, “allelic,” “splice,” “species,” or “polymorphic” variants. A splice variant may have significant identity to a reference molecule, but will generally have a greater or lesser number of polynucleotides due to alternate splicing of exons during mRNA processing. The corresponding polypeptide may possess additional functional domains or an absence of domains. Species variants are polynucleotide sequences that vary from one species to another. The resulting polypeptides generally will have significant amino acid identity relative to each other. A polymorphic variant is a variation in the polynucleotide sequence of a particular gene between individuals of a given species. Polymorphic variants also may encompass “single nucleotide polymorphisms” (SNPs) in which the polynucleotide sequence varies by one base. The presence of SNPs may be indicative of, for example, a certain population, a disease state, or a propensity for a disease state.

Microarrays for Determining the Level of Expression of Genes Characteristic of Human Lung Tumor Tissue

Generally, determining expression profiles with microarrays involves the following steps: (a) obtaining an mRNA sample from a subject and preparing labeled nucleic acids therefrom (the “target nucleic acids” or “targets”); (b) contacting the target nucleic acids with an array under conditions sufficient for the target nucleic acids to bind to the corresponding probes on the array, for example, by hybridization or specific binding; (c) optional removal of unbound targets from the array; (d) detecting the bound targets, and (e) analyzing the results, for example, using computer based analysis methods. As used herein, “nucleic acid probes” or “probes” are nucleic acids attached to the array, whereas “target nucleic acids” are nucleic acids that are hybridized to the array. Each of these steps is described in more detail below.

Nucleic acid specimens may be obtained from an individual to be tested using either “invasive” or “non-invasive” sampling means. A sampling means is said to be “invasive” if it involves the collection of nucleic acids from within the skin or organs of an animal (including murine, human, ovine, equine, bovine, porcine, canine, or feline animal). Examples of invasive methods include blood collection, semen collection, needle biopsy, pleural aspiration, umbilical cord biopsy, etc. Examples of such methods are discussed by Kim, et al., (J. Virol. 66:3879-3882, 1992); Biswas, et al., (Ann. NY Acad. Sci. 590:582-583, 1990); and Biswas, et al., (J. Clin. Microbiol. 29:2228-2233, 1991).

In contrast, a “non-invasive” sampling means is one in which the nucleic acid molecules are recovered from an internal or external surface of the animal. Examples of such “non-invasive” sampling means include, for example, “swabbing,” collection of tears, saliva, urine, fecal material, sweat or perspiration, hair, etc.

In one embodiment of the present invention, one or more cells from the subject to be tested are obtained and RNA is isolated from the cells. In one embodiment, a sample of peripheral blood leukocytes (PBLs) cells is obtained from the subject. It is also possible to obtain a cell sample from a subject, and then to enrich the sample for a desired cell type. For example, cells may be isolated from other cells using a variety of techniques, such as isolation with an antibody binding to an epitope on the cell surface of the desired cell type. Where the desired cells are in a solid tissue, particular cells may be dissected, for example, by microdissection or by laser capture microdissection (LCM) (see, e.g., Bonner, et al., Science 278:1481, 1997; Emmert-Buck, et al., Science 274:998, 1996; Fend, et al., Am. J. Path. 154:61, 1999; and Murakami, et al., Kidney Int. 58:1346, 2000).

RNA may be extracted from tissue or cell samples by a variety of methods, for example, guanidium thiocyanate lysis followed by CsCl centrifugation (Chirgwin, et al., Biochemistry 18:5294-5299, 1979). RNA from single cells may be obtained as described in methods for preparing cDNA libraries from single cells (see, e.g., Dulac, Curr. Top. Dev. Biol. 36:245, 1998; Jena, et al., J. Immunol. Methods 190:199, 1996).

The RNA sample can be further enriched for a particular species. In one embodiment, for example, poly(A)+ RNA may be isolated from an RNA sample. In particular, poly-T oligonucleotides may be immobilized on a solid support to serve as affinity ligands for mRNA. Kits for this purpose are commercially available, for example, the MessageMaker kit (Life Technologies, Grand Island, N.Y.).

In one embodiment, the RNA population may be enriched for sequences of interest, such as the genes characteristic of human lung tumor tissue (e.g., SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399). Enrichment may be accomplished, for example, by primer-specific cDNA synthesis, or multiple rounds of linear amplification based on cDNA synthesis and template-directed in vitro transcription (see, e.g., Wang, et al., Proc. Natl. Acad. Sci. USA 86:9717, 1989; Dulac, et al., supra; Jena, et al., supra).

The population of RNA, enriched or not in particular species or sequences, may be further amplified. Such amplification is particularly important when using RNA from a single cell or a few cells. A variety of amplification methods are suitable for use in the methods of the present invention, including, for example, PCR; ligase chain reaction (LCR) (see, e.g. Wu and Wallace, Genomics 4:560, 1989; Landegren, et al., Science 241:1077, 1988); self-sustained sequence replication (SSR) (see, e.g., Guatelli, et al., Proc. Natl. Acad. Sci. USA 87:1874, 1990); nucleic acid based sequence amplification (NASBA) and transcription amplification (see, e.g., Kwoh, et al., Proc. Natl. Acad. Sci. USA 86:1173, 1989). Methods for PCR technology are well known in the art (see, e.g., PCR Technology: Principles and Applications for DNA Amplification (ed. H. A. Erlich, Freeman Press, N.Y., N.Y., 1992); PCR Protocols: A Guide to Methods and Applications (eds. Innis, et al., Academic Press, San Diego, Calif., 1990); Mattila, et al., Nucleic Acids Res. 19:4967, 1991; Eckert, et al., PCR Methods and Applications 1:17, 1991; PCR (eds. McPherson, et al., IRL Press, Oxford); and U.S. Pat. No. 4,683,202). Methods of amplification are described, for example, by Ohyama, et al., (BioTechniques 29:530, 2000); Luo, et al., (Nat. Med. 5:117, 1999); Hegde, et al., (BioTechniques 29:548, 2000); Kacharmina, et al., (Meth. Enzymol. 303:3, 1999); Livesey, et al., Curr. Biol. 10:301, 2000); Spirin, et al., (Invest. Ophtalmol. Vis. Sci. 40:3108, 1999); and Sakai, et al., (Anal. Biochem. 287:32, 2000). RNA amplification and cDNA synthesis may also be conducted in cells in situ (see, e.g., Eberwine, et al. Proc. Natl. Acad. Sci. USA 89:3010, 1992).

Generally, the target molecules will be labeled to permit detection of hybridization of the target molecules to a microarray. That is, the probe may comprise a member of a signal producing system and thus, is detectable, either directly or through combined action with one or more additional members of a signal producing system. Examples of directly detectable labels include isotopic and fluorescent moieties incorporated, usually by a covalent bond, into a moiety of the probe, such as a nucleotide monomeric unit (e.g., dNMP of the primer), or a photoactive or chemically active derivative of a detectable label which can be bound to a functional moiety of the probe molecule.

Nucleic acids may be labeled during or after enrichment and/or amplification of RNAs. For example, reverse transcription may be carried out in the presence of a dNTP conjugated to a detectable label, for example, a fluorescently labeled DNTP. In another embodiment, the cDNA or RNA probe may be synthesized in the absence of detectable label and may be labeled subsequently, for example, by incorporating biotinylated dNTPs or rNTP, or some similar means (e.g., photo-cross-linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin (e.g., phycoerythrin-conjugated streptavidin) or the equivalent.

Fluorescent moieties or labels of interest include coumarin and its derivatives (e.g., 7-amino-4-methylcoumarin, aminocoumarin); bodipy dyes such as Bodipy FL and cascade blue; fluorescein and its derivatives (e.g., fluorescein isothiocyanate, Oregon green); rhodamine dyes (e.g., Texas red, tetramethylrhodamine); eosins and erythrosins; cyanine dyes (e.g., Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7); FluorX, macrocyclic chelates of lanthanide ions (e.g., quantum dye™); fluorescent energy transfer dyes such as thiazole orange-thidium heterodiimer, TOTAB, dansyl, etc. Individual fluorescent compounds which have functionalities for linking to an element desirably detected in an apparatus or assay of the invention, or which may be modified to incorporate such functionalities may also be utilized (see, e.g., Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press San Diego, Calif.).

Chemiluminescent labels include luciferin and 2,3-dihydrophthalazinediones, for example, luminol.

Labels may also be members of a signal producing system that act in concert with one or more additional members of the same system to provide a detectable signal. Illustrative of such labels are members of a specific binding pair, such as ligands, for example, biotin, fluorescein, digoxigenin, antigen, polyvalent cations, chelator groups and the like. Members may specifically bind to additional members of the signal producing system, and the additional members may provide a detectable signal either directly or indirectly, for example, an antibody conjugated to a fluorescent moiety or an enzymatic moiety capable of converting a substrate to a chromogenic product (e.g., alkaline phosphatase conjugate antibody and the like).

Additional labels of interest include those that provide a signal only when the probe with which it is associated is specifically bound to a target molecule. Such labels include “molecular beacons” as described in Tyagi and Kramer (Nature Biotech. 14:303, 1996) and EP 0 070 685 B1. Other labels of interest include those described in U.S. Pat. No. 5,563,037; WO 97/17471; and WO 97/17076.

In other embodiments, the target nucleic acid may not be labeled. In this case, hybridization may be determined, for example, by plasmon resonance (see, e.g., Thiel, et al. Anal. Chem. 69:4948, 1997).

In one embodiment, a plurality (e.g., 2, 3, 4, 5, or more) of sets of target nucleic acids are labeled and used in one hybridization reaction (“multiplex” analysis). For example, one set of nucleic acids may correspond to RNA from one cell and another set of nucleic acids may correspond to RNA from another cell. The plurality of sets of nucleic acids may be labeled with different labels, for example, different fluorescent labels (e.g., fluorescein and rhodamine) which have distinct emission spectra so that they can be distinguished. The sets may then be mixed and hybridized simultaneously to one microarray (see, e.g., Shena, et al., Science 270:467-470, 1995).

Examples of distinguishable labels for use when hybridizing a plurality of target nucleic acids to one array are well known in the art and include: two or more different emission wavelength fluorescent dyes such as Cy3 and Cy5; combination of fluorescent proteins and dyes such as phicoerythrin and Cy5; two or more isotopes with different energy of emission such as ³²P and ³³P; gold or silver particles with different scattering spectra; labels which generate signals under different treatment conditions such as temperature, pH, treatment with additional chemical agents, etc.; or generate signals at different time points after treatment. Using one or more enzymes for signal generation allows for the use of an even greater variety of distinguishable labels, based on different substrate specificity of enzymes (e.g., alkaine phosphatase/peroxidase).

The quality of labeled nucleic acids may be evaluated prior to hybridization to an array. In one embodiment, the GeneChip® Test3 Array from Affymetrix (Santa Clara, Calif.) may be used for that purpose. This array contains probes representing a subset of characterized genes from several organisms including mammals. Thus, the quality of a labeled nucleic acid sample can be determined by hybridization of a fraction of the sample to an array.

Microarrays for use according to the invention include one or more probes of genes characteristic of human lung tumor tissue. In one embodiment, the microarray comprises probes corresponding to one or more of genes selected from the group consisting of genes which are up-regulated in cancer and genes which are down-regulated in cancer. The microarray may comprise probes corresponding to at least 10, at least 20, at least 50, at least 100 or at least 1000 genes characteristic of human lung tumor tissue. The microarray may comprise probes corresponding to each gene listed in the Tables.

There may be one or more than one probe corresponding to each gene on a microarray. For example, a microarray may contain from 2 to 20 probes corresponding to one gene or about 5 to 10. The probes may correspond to the full-length RNA sequence or complement thereof of genes characteristic of human lung tumor tissue, or the probe may correspond to a portion thereof, which portion is of sufficient length to permit specific hybridization. Such probes may comprise from about 50 nucleotides to about 100, 200, 500, or 1000 nucleotides or more than 1000 nucleotides. As further described herein, microarrays may contain oligonucleotide probes, consisting of about 10 to 50 nucleotides, about 15 to 30 nucleotides, or about 20-25 nucleotides. The probes are may be single-stranded and will have sufficient complementarity to its target to provide for the desired level of sequence specific hybridization.

Typically, the arrays used in the present invention will have a site density of greater than 100 different probes per cm². The arrays may have a site density of, for example, greater than 500/cm², greater than about 1000/cm², or greater than about 10,000/cm². The arrays may have, for example, more than 100 different probes on a single substrate, greater than about 1000 different probes, greater than about 10,000 different probes, or greater than 100,000 different probes on a single substrate.

A number of different microarray configurations and methods for their production are known to those of skill in the art and are disclosed in U.S. Pat. Nos: 5,242,974; 5,384,261; 5,405,783; 5,412,087; 5,424,186; 5,429,807; 5,436,327; 5,445,934; 5,556,752; 5,405,783; 5,412,087; 5,424,186; 5,429,807; 5,436,327; 5,472,672; 5,527,681; 5,529,756; 5,545,531; 5,554,501; 5,561,071; 5,571,639; 5,593,839; 5,624,711; 5,700,637; 5,744,305; 5,770,456; 5,770,722; 5,837,832; 5,856,101; 5,874,219; 5,885,837; 5,919,523; 6,022,963; 6,077,674; and 6,156,501; Shena, et al., Tibtech 16:301, 1998; Duggan, et al., Nat. Genet. 21:10, 1999; Bowtell, et al., Nat. Genet. 21:25, 1999; Lipshutz, et al., 21 Nature Genet. 20-24, 1999; Blanchard, et al., 11 Biosensors and Bioelectronics, 687-90, 1996; Maskos, et al., 21 Nucleic Acids Res. 4663-69, 1993; Hughes, et al., Nat. Biotechol. 19:342, 2001; the disclosures of which are herein incorporated by reference. Patents describing methods of using arrays in various applications include: U.S. Pat. Nos. 5,143,854; 5,288,644; 5,324,633; 5,432,049; 5,470,710; 5,492,806; 5,503,980; 5,510,270; 5,525,464; 5,547,839; 5,580,732; 5,661,028; 5,848,659; and 5,874,219; the disclosures of which are herein incorporated by reference.

Arrays may also include control and reference nucleic acids. Control nucleic acids include, for example, prokaryotic genes such as bioB, bioC and bioD, cre from P1 bacteriophage or polyA controls, such as dap, lys, phe, thr, and trp. Reference nucleic acids allow the normalization of results from one experiment to another and the comparison of multiple experiments on a quantitative level. Exemplary reference nucleic acids include housekeeping genes of known expression levels, for example, GAPDH, hexokinase, and actin.

In one embodiment, an array of oligonucleotides may be synthesized on a solid support. Exemplary solid supports include glass, plastics, polymers, metals, metalloids, ceramics, organics, etc. Using chip masking technologies and photoprotective chemistry, it is possible to generate ordered arrays of nucleic acid probes. These arrays, which are known, for example, as “DNA chips” or very large scale immobilized polymer arrays (“TLSIPS™” arrays), may include millions of defined probe regions on a substrate having an area of about 1 cm² to several cm², thereby incorporating from a few to millions of probes (see, e.g., U.S. Pat. No. 5,631,734).

A nucleic acid probe may be at least, for example, about 10, 15, 20, 25, 30, 50, 100 or more nucleotides, and may comprise the full-length gene. For example, probes may be those that hybridize specifically to the genes listed in the Tables.

Nucleic acid probes may be obtained, for example, by PCR amplificartion of gene segments from genomic, cDNA (e.g., RT-PCR), or cloned sequences. cDNA probes may be prepared according to methods known in the art and further described herein, for example, by reverse-transcription PCR (RT-PCR) of RNA using sequence specific primers. Sequences of genes or cDNA from which probes are generated may be obtained, for example, from GenBank, other public databases, or publications.

Oligonucleotide probes may also be synthesized by standard methods known in the art, for example, by automated DNA synthesizer or any other chemical method. As an example, phosphorothioate oligonucleotides may be synthesized by the method of Stein, et al., (Nucl. Acids Res. 16:3209, 1988), and methylphosphonate oligonucleotides may be prepared by controlled pore glass polymer supports (see, e.g., Sarin, et al., Proc. Natl. Acad. Sci. U.S.A. 85:7448-7451, 1988). In another embodiment, the oligonucleotide may be a 2′-)-methylribonucleotide (Inoue, et al., Nucl. Acids Res. 15:6131-6148, 1987), or a chimeric RNA-DNA analog (Inoue, et al., FEBS Lett. 215:327-330, 1987).

Nucleic acid probes may be natural nucleic acids or chemically modified nucleic acids (e.g., composed of nucleotide analogs); however, the probes should possess activated hydroxyl groups compatible with the linking chemistry. The protective groups may be photolabile, or the protective groups may be labile under certain chemical conditions (e.g., acid). The surface of the solid support may contain a composition that generates acids upon exposure to light. Thus, exposure of a region of the substrate to light generates acids in that region that remove the protective groups in the exposed region. Also, the synthesis method may use 3′-protected 5′-0-phosphoramidite-activated deoxynucleoside. In this case, the oligonucleotide is synthesized in the 5′ to 3′ direction, which results in a free 5′ end.

In one embodiment of the present invention, oligonucleotides of an array may be synthesized using a 96-well automated multiplex oligonucleotide synthesizer (A.M.O.S.) that is capable of producing thousands of oligonucleotides (see, e.g., Lashkari, et al., Proc. Natl. Acad. Sci. USA 93: 7912, 1995).

To compare expression levels, labeled nucleic acids may be contacted with the array under conditions sufficient for binding between the target nucleic acid and the probe on the array. In one embodiment, the hybridization conditions may be selected to provide for the desired level of hybridization specificity; that is, conditions sufficient for hybridization to occur between the labeled nucleic acids and probes on the microarray.

Hybridization may be carried out in conditions permitting essentially specific hybridization. The length and GC content of the nucleic acid will determine the thermal melting point and thus, the hybridization conditions necessary for obtaining specific hybridization of the probe to the target nucleic acid. These factors are well known to a person of skill in the art, and may also be tested in assays. An extensive guide to nucleic acid hybridization may be found in Tijssen, et al. (Laboratory Techniques in Biochemistry and Molecular Biology, Vol. 24: Hybridization With Nucleic Acid Probes, P. Tijssen, ed. Elsevier, N.Y., (1993)). Generally, stringent conditions may be selected to be about 5° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Highly stringent conditions may be selected to be equal to the Tm point for a particular probe. Sometimes the term “dissociation temperature” (Td) is used to define the temperature at which at least half of the probe dissociates from a perfectly matched target nucleic acid. In any case, a variety of techniques for estimating the Tm or Td are available, and generally are described in Tijssen, supra. Typically, G-C base pairs in a duplex are estimated to contribute about 3° C. to the Tm, while A-T base pairs are estimated to contribute about 2° C., up to a theoretical maximum of about 80-100° C. However, more sophisticated models of Tm and Td are available in which G-C stacking interactions, solvent effects, the desired assay temperature, and the like are taken into account.

In one embodiment, non-specific binding or background signal may be reduced by the use of a detergent (e.g, C-TAB) or a blocking reagent (e.g., sperm DNA, cot-1 DNA, etc.) during the hybridization. In one embodiment, the hybridization may be performed in the presence of about 0.5 mg/mi DNA (e.g., herring sperm DNA). The use of blocking agents in hybridization is well known to those of skill in the art (see, e.g., Tijssen, supra).

If the target sequences are detected using the same label, different arrays may be employed for each physiological source or the same array may be screened multiple times. The above methods may be varied to provide for multiplex analysis by employing different and distinguishable labels for the different target populations (e.g., different physiological sources). According to this multiplex method, the same array may be used at the same time for each of the different target populations.

The methods described above result in the production of hybridization patterns of labeled target nucleic acids on the array surface. The resultant hybridization patterns of labeled nucleic acids may be visualized or detected in a variety of ways, with the particular manner of detection selected based on the particular label of the target nucleic acid. Representative detection means include scintillation counting, autoradiography, fluorescence measurement, colorimetric measurement, light emission measurement, light scattering, and the like.

One such method of detection utilizes an array scanner that is commercially available (Affymetrix, Santa Clara, Calif.), for example, the 417™ Arrayer, the 418™ Array Scanner, or the Agilent GeneArray™ Scanner. This scanner is controlled from a system computer with an interface and easy-to-use software tools. The output may be directly imported into or directly read by a variety of software applications. Scanning devices are described in, for example, U.S. Pat. Nos. 5,143,854 and 5,424,186.

For fluorescent labeled probes, the fluorescence emissions at each site of a transcript array may be detected by scanning confocal laser microscopy. Alternatively, a laser may be used that allows simultaneous specimen illumination at wavelengths specific to the two fluorophores and emissions from the two fluorophores may be analyzed simultaneously (see, e.g., Shalon, et al., Genome Res. 6:639-645, 1996). For example, the arrays may be scanned with a laser fluorescent scanner with a computer controlled X-Y stage and a microscope objective. Fluorescence laser scanning devices are described in Shalon, et al., supra.

Following the data gathering operation, the data will typically be reported to a data analysis operation. To facilitate the sample analysis operation, the data obtained by the reader from the device may be analyzed using a digital computer. Typically, the computer will be appropriately programmed for receipt and storage of the data from the device, as well as for analysis and reporting of the data gathered, for example, subtraction of the background, deconvolution of multi-color images, flagging or removing artifacts, verifying that controls have performed properly, normalizing the signals, interpreting fluorescence data to determine the amount of hybridized target, normalization of background and single base mismatch hybridizations, and the like.

In one embodiment, a system comprises a search function that allows one to search for specific patterns, for example, patterns relating to differential gene expression, for example, between the expression profile of a cancer cell and the expression profile of a counterpart normal cell in a subject. For example, a system allows one to search for patterns of gene expression between more than two samples.

Various algorithms are available for analyzing gene expression profile data, for example, the type of comparisons to perform. In certain embodiments, it is desirable to group genes that are co-regulated. This allows for the comparison of large numbers of profiles. One embodiment for identifying such groups of genes involves clustering algorithms (for reviews of clustering algorithms, see, e.g., Fukunaga, 1990, Statistical Pattern Recognition, 2nd Ed., Academic Press, San Diego; Everitt, 1974, Cluster Analysis, London: Heinemann Educ. Books; Hartigan, 1975, Clustering Algorithms, New York: Wiley; Sneath and Sokal, 1973, Numerical Taxonomy, Freeman; Anderberg, 1973, Cluster Analysis for Applications, Academic Press: New York).

Clustering may be based on other characteristics of the genes, for example, their level of expression (see, e.g., U.S. Pat. No. 6,203,987), or permit clustering of time curves (see, e.g. U.S. Pat. No. 6,263,287). Examples of clustering algorithms include K-means clustering and hierarchical clustering. Clustering may also be achieved by visual inspection of gene expression data using a graphical representation of the data (e.g. a “heat map”). An example of software which contains clustering algorithms and a means to graphically represent gene expression data is Spotfire DecisionSite (Spotfire, Inc., Somerville, Mass. and Goteborg, Sweden).

Comparison of the expression levels of one or more genes characteristic of human lung tumor tissue with reference expression levels, for example, expression levels in diseased cells of cancer or in normal counterpart cells, may be conducted using computer systems. In one embodiment, expression levels may be obtained from two cells and these two sets of expression levels may be introduced into a computer system for comparison. For example, one set of expression levels is entered into a computer system for comparison with values that are already present in the computer system, or in computer-readable form that is then entered into the computer system.

In one embodiment, the computer system may also contain a database comprising values representing levels of expression of one or more genes characteristic of human lung tumor tissue. The database may contain one or more expression profiles of genes characteristic of human lung tumor tissue in different cells.

In another embodiment, the invention provides a computer-readable form of the gene expression profile data, or of values corresponding to the level of expression of at least one gene characteristic of cancer in a diseased cell. The values may be mRNA expression levels obtained from experiments, for example, microarray analysis. The values may also be mRNA levels normalized relative to a reference gene whose expression is constant in numerous cells under numerous conditions (e.g., GAPDH). In other embodiments, the values in the computer may be ratios of, or differences between, normalized or non-normalized mRNA levels in different samples.

In one embodiment, the expression profiles expression profiles from cancer cells of one or more subjects, which cells are treated in vivo or in vitro with a drug or cytoline treatment, for example, a combination of TNFα and IFNγ used as a potential therapy of cancer. Expression data of a cell of a subject treated in vitro or in vivo with the drug is entered into a computer and the computer is instructed to compare the data entered to the data in the computer, and to provide results indicating whether the expression data input into the computer are more similar to those of a cell of a subject that is responsive to the drug or more similar to those of a cell of a subject that is not responsive to the drug. Thus, the results indicate whether the subject is likely to respond to the treatment with the drug or unlikely to respond to it.

The invention also provides a machine-readable or computer-readable medium including program instructions for performing the following steps: (i) comparing a plurality of values corresponding to expression levels of one or more genes characteristic of human lung tumor tissue in a query cell with a database including records comprising reference expression or expression profile data of one or more reference cells and an annotation of the type of cell; and (ii) indicating to which cell the query cell is most similar based on similarities of expression profiles. The reference cells may be cells from subjects at different stages of cancer. The reference cells may also be cells from subjects responding or not responding to a particular drug treatment and optionally incubated in vitro or in vivo with the drug.

The reference cells may also be cells from subjects responding or not responding to several different treatments, and the computer system indicates a preferred treatment for the subject. Accordingly, the invention provides a method for selecting a therapy for a patient having cancer, the method comprising: (i) providing the level of expression of one or more genes characteristic of human lung tumor tissue in a diseased cell of the patient; (ii) providing a plurality of reference profiles, each associated with a therapy, wherein the subject expression profile and each reference profile has a plurality of values, each value representing the level of expression of a gene characteristic of cancer; and (iii) selecting the reference profile most similar to the subject expression profile, to thereby select a therapy for said patient. In one embodiment, step (iii) may be performed by a computer. The most similar reference profile may be selected by weighing a comparison value of the plurality using a weight value associated with the corresponding expression data.

The relative abundance of an mRNA in two biological samples may be scored as a perturbation and its magnitude determined (i.e., the abundance is different in the two sources of mRNA tested), or as not perturbed (i.e., the relative abundance is the same). In various embodiments, a difference between the two sources of RNA of at least a factor of about 25% (RNA from one source is 25% more abundant in one source than the other source), more usually about 50%, even more often by a factor of about 2 (twice as abundant), 3 (three times as abundant) or 5 (five times as abundant) is scored as a perturbation. Perturbations may be used by a computer for calculating and expression comparisons.

Drug Design Using Microarrays

The invention also provides methods for designing and optimizing drugs for cancer, for example, those which have been identified as described herein. In one embodiment, compounds may be screened by comparing the expression level of one or more genes characteristic of human lung tumor tissue following incubation of a diseased cell of cancer or similar cell with the test compound. In another embodiment, the expression level of the genes may be determined using microarrays, and comparing the gene expression profile of a cell in response to the test compound with the gene expression profile of a normal cell corresponding to a diseased cell of cancer (a “reference profile”). In a further embodiment, the expression profile may also be compared to that of a diseased cell of cancer. The comparisons may be done by introducing the gene expression profile data of the cell treated with drug into a computer system comprising reference gene expression profiles, which are stored in a computer readable form, using appropriate algorithms. Test compounds may be screened for those that alter the level of expression of genes characteristic of human lung tumor tissue. Such compounds, that is, compounds which are capable of normalizing the expression of essentially all genes characteristic of human lung tumor tissue, are candidate therapeutics.

The efficacy of the compounds may then be tested in additional in vitro and in vivo assays, and in animal models (e.g., xenograft model). The test compound may be administered to the test animal, and one or more symptoms of the disease may be monitored for improvement of the condition of the animal. Expression of one or more genes characteristic of human lung tumor tissue may also be measured before and after administration of the test compound to the animal. A normalization of the expression of one or more of these genes is indicative of the efficiency of the compound for treating cancer in the animal.

In the clinical setting, obtaining human-derived samples of tissue exhibiting cancer may be difficult, if not prohibitive. Therefore, identification of gene expression changes indicative of efficacy of a therapeutic compound may be determined in a more easily accessible, surrogate cell population, for example, peripheral blood leukocytes (PBLs). This method may be performed either in a human or animal model system. In one embodiment, a test compound may be administered to the test animal (either normal or cancer-containing) at the same doses that have been observed to be efficacious in treating cancer in that animal model. Blood may be drawn from the animal at various time points (e.g., 1, 4, 7, and 24 hours following the first, mid-point, and last day of a regimen of multiple day dosing). Animals dosed with vehicle may be used as controls. RNA may be isolated from PBLs, and can be used to generate probes for hybridization to microarrays. The hybridization results may then be analyzed using computer programs and databases, as described above. The resulting expression profile may be compared directly to the analogous profile from the treated cancer tissue for similarities or simply correlated with efficacy (e.g., in terms of doses and time points) in the animal model.

In another embodiment, human blood may be treated ex vivo with a therapeutic compound at a dose consistent with the therapeutic dose in the animal model, or at a dose that is consistent with known plasma levels of the therapeutic dose in the animal model. The blood may be treated (e.g., rocking at 37° C.) with the therapeutic compound immediately, or after some period of incubation time (e.g., 24 hours) to allow for gene expression to re-equilibrate after the blood draw. The blood may also be treated with the therapeutic compound for various timepoints (e.g., 4 and 24 hours), and then PBL RNA isolated and used to create a probe for hybridization to a microarray. A compound solubilization agent (e.g., DMSO) may be used as a control. The resulting expression profile may be compared directly to the analogous profile from the treated cancer tissue for similarities or simply correlated with efficacy (e.g., in terms of doses and time points) in the animal model.

The toxicity of the candidate therapeutic compound may be evaluated, for example, by determining whether the compound induces the expression of genes known to be associated with a toxic response. Expression of such toxicity related genes may be determined in different cell types, for example, those that are known to express the genes. In fact, alterations in gene expression may serve as a more sensitive marker of human toxicity than routine preclinical safety studies. Microarrays may be used for detecting changes in the expression of genes known to be associated with a toxic response. It may be possible to perform proof of concept studies demonstrating that changes in gene expression levels may predict toxic events that were not identified by routine preclinical safety testing (see, e.g., Huang, et al., Toxicol. Sci. 63:196-207, 2001; Waring, et al., Toxicol. Appl. Pharamacol. 175:28-42, 2001).

Drug screening may be performed by adding a test compound to a sample of cells, and monitoring the effect. A parallel sample which does not receive the test compound may also be monitored as a control. The treated and untreated cells are then compared by any suitable phenotypic criteria, including but not limited to microscopic analysis, viability testing, ability to replicate, histological examination, the level of a particular RNA or polypeptide associated with the cells, the level of enzymatic activity expressed by the cells or cell lysates, and the ability of the cells to interact with other cells or compounds. Differences between treated and untreated cells indicates effects attributable to the test compound.

Desirable effects of a test compound include an effect on any phenotype that was conferred by the cancer-associated marker nucleic acid sequence. Examples include a test compound that limits the overabundance of mRNA, limits production of the encoded protein, or limits the functional effect of the protein. The effect of the test compound would be apparent when comparing results between treated and untreated cells.

Diagnostic and Prognostic Assays

The present invention provides nucleic acid sequences which are differentially regulated in cancer, and a method for identifying such sequences. The present invention provides a method for identifying a nucleotide sequence which is differentially regulated in a subject with cancer, comprising: hybridizing a nucleic acid sample corresponding to RNA obtained from the subject to a nucleic acid sample comprising one or more nucleic acid molecules of known identity; and measuring the hybridization of the nucleic acid sample to the one or more nucleic acid molecules of known identity, wherein a difference in the hybridization of the nucleic acid sample to the one or more nucleic acid molecules of known identity relative to a nucleic acid sample obtained from a subject without cancer is indicative of the differential expression of the nucleotide sequence in a subject with cancer.

Generally, the present invention provides a method for identifying nucleic acid sequences which are differentially regulated in a subject with cancer comprising isolating messenger RNA from a subject, generating cRNA from the mRNA sample, hybridizing the cRNA to a microarray comprising a plurality of nucleic acid molecules stably associated with discrete locations on the array, and identifying patterns of hybridization of the cRNA to the array. According to the present invention, a nucleic acid molecule which hybridizes to a given location on the array is said to be differentially regulated if the hybridization signal is, for example, at least two-fold higher or lower than the hybridization signal at the same location on an identical array hybridized with a nucleic acid sample obtained from a subject that does not have cancer.

Expression patterns may be used to derive a panel of biomarkers that can be used to predict the efficacy of drug treatment in the patients. The biomarkers may consist of gene expression levels from microarray experiments on RNA isolated from biological samples, RNA isolated from frozen samples of tumor biopsies, or mass spectrometry-derived protein masses in the serum.

Although the precise mechanism for data analysis will depend upon the exact nature of the data, a typical procedure for developing a panel of biomarkers is as follows. The data (gene expression levels or mass spectra) are collected for each patient prior to treatment. As the study progresses, the patients are classified according to their response to the drug treatment; either as efficacious or non-efficacious. Multiple levels of efficacy can be accommodated in a data model, but a binary comparison is considered optimal, particularly if the patient population is less than several hundred. Assuming adequate numbers of patients in each class, the protein and/or gene expression data may be analyzed by a number of techniques known in the art. Many of the techniques are derived from traditional statistics as well from the field of machine leaning. These techniques serve two purposes:

1. Reduce the dimensionality of data—In the case of mass spectra or gene expression microarrays, data is reduced from many thousands of individual data points to bout three to ten. The reduction is based upon the predictive power of the data points when taken as a set.

2. Training—These three to ten data points are then used to train multiple machine learning algorithms which then “learn” to recognize, in this case, patterns of protein masses or gene expression which distinguish efficacious drug treatment from non-efficacious. All patient samples can be used to train the algorithms.

The resulting trained algorithms are then tested in order to measure their predictive power. Typically, when less than many hundreds of training examples are available, some form of cross-validation is performed. To illustrate, consider a ten-fold cross validation. In this case, patient samples are randomly assigned to one of ten bins. In the first round of validation the samples in nine of the bins are used for training and the remaining samples in the tenth bin are used to test the algorithm. This is repeated an additional nine times, each time leaving out the samples in a different bin for testing. The results (correct predictions and errors) from all ten rounds are combined and the predictive power is then assessed. Different algorithms, as well as different panels, may be compared in this way for this study. The “best” algorithm/panel combination will then be selected. This “smart” algorithm may then be used in future studies to select the patients that are most likely to respond to treatment.

Many algorithms benefit from additional information taken for the patients. For example, gender or age could be used to improve predictive power. Also, data transformations such as normalization and smoothing may be used to reduce noise. Because of this, a large number of algorithms may be trained using many different parameters in order to optimize the outcome. If predictive patterns exist in the data, it is likely that an optimal, or near-optimal, “smart” algorithm can be developed. If more patient samples become available, the algorithm can be retrained to take advantage of the new data.

As an example using mass spectrometry, plasma may be applied to a hydrophobic SELDI-target, washed extensively in water, and analyzed by the SELDI-T of mass spectrometer. This may be repeated on 100 or more patient samples. The protein profiles resulting from the intensities of some 16,000 m/z values in each sample would be statistically analyzed in order to identify sets of specific m/z values that are predictive of drug efficacy. Identical experiments using other SELDI-targets, such as ion-exchange or IMAC surfaces, could also be conducted. These will capture different subsets of the proteins present in plasma. Furthermore, the plasma may be denatured and prefractionated prior to application onto the SELDI target.

The present invention provides methods for determining whether a subject is at risk for developing a disease or condition characterized by unwanted cell proliferation by detecting biomarkers, that is, nucleic acids and/or polypeptide markers for cancer.

In clinical applications, human tissue samples may be screened for the presence and/or absence of biomarkers identified herein. Such samples could consist of needle biopsy cores, surgical resection samples, lymph node tissue, or serum. For example, these methods include obtaining a biopsy, which is optionally fractionated by cryostat sectioning to enrich tumor cells to about 80% of the total cell population. In certain embodiments, nucleic acids extracted from these samples may be amplified using techniques well known in the art. The levels of selected markers detected would be compared with statistically valid groups of metastatic, non-metastatic malignant, benign, or normal tissue samples.

In one embodiment, the diagnostic method comprises determining whether a subject has an abnormal mRNA and/or protein level of the biomarkers, such as by Northern blot analysis, reverse transcription-polymerase chain reaction (RT-PCR), in situ hybridization, immunoprecipitation, Western blot hybridization, or immunohistochemistry. According to the method, cells may be obtained from a subject and the levels of the biomarkers, protein, or mRNA level, are determined and compared to the level of these markers in a healthy subject. An abnormal level of the biomarker polypeptide or mRNA levels is likely to be indicative of cancer.

In one embodiment, the method comprises using a nucleic acid probe to determine the presence of cancerous cells in a tissue from a patient. Specifically, the method comprises:

-   -   1. providing a nucleic acid probe comprising a nucleotide         sequence, for example, at least 10, 15, 25 or 40 nucleotides,         and up to all or nearly all of the coding sequence which is         complementary to a portion of the coding sequence of a nucleic         acid sequence and is differentially expressed in tumors cells;     -   2. obtaining a tissue sample from a patient potentially         comprising cancerous cells;     -   3. providing a second tissue sample containing cells         substantially all of which are non-cancerous;     -   4. contacting the nucleic acid probe under stringent conditions         with RNA of each of said first and second tissue samples (e.g.,         in a Northern blot or in situ hybridization assay); and     -   5. comparing (a) the amount of hybridization of the probe with         RNA of the first tissue sample, with (b) the amount of         hybridization of the probe with RNA of the second tissue sample;         wherein a statistically significant difference in the amount of         hybridization with the RNA of the first tissue sample as         compared to the amount of hybridization with the RNA of the         second tissue sample is indicative of the presence of cancerous         cells in the first tissue sample.

In one aspect, the method comprises in situ hybridization with a probe derived from a given marker nucleic acid sequence. The method comprises contacting the labeled hybridization probe with a sample of a given type of tissue potentially containing cancerous or pre-cancerous cells as well as normal cells, and determining whether the probe labels some cells of the given tissue type to a degree significantly different (e.g., by at least a factor of two, or at least a factor of five, or at least a factor of twenty, or at least a factor of fifty) than the degree to which it labels other cells of the same tissue type.

Also within the invention is a method of determining the phenotype of a test cell from a given human tissue, for example, whether the cell is (a) normal, or (b) cancerous or precancerous, by contacting the mRNA of a test cell with a nucleic acid probe, for example, at least about 10, 15, 25, or 40 nucleotides, and up to all or nearly all of a sequence which is complementary to a portion of the coding sequence of a nucleic acid sequence, and which is differentially expressed in tumor cells as compared to normal cells of the given tissue type; and determining the approximate amount of hybridization of the probe to the mRNA, an amount of hybridization either more or less than that seen with the mRNA of a normal cell of that tissue type being indicative that the test cell is cancerous or pre-cancerous.

Alternatively, the above diagnostic assays may be carried out using antibodies to detect the protein product encoded by the marker nucleic acid sequence. Accordingly, in one embodiment, the assay would include contacting the proteins of the test cell with an antibody specific for the gene product of a nucleic acid, the marker nucleic acid being one which is expressed at a given control level in normal cells of the same tissue type as the test cell, and determining the approximate amount of immunocomplex formation by the antibody and the proteins of the test cell, wherein a statistically significant difference in the amount of the immunocomplex formed with the proteins of a test cell as compared to a normal cell of the same tissue type is an indication that the test cell is cancerous or pre-cancerous.

The method for producing polyclonal and/or monoclonal antibodies which specifically bind to polypeptides useful in the present invention is known to those of skill in the art and may be found in, for example, Dymecki, et al., (J. Biol. Chem. 267:4815, 1992); Boersma & Van Leeuwen, (J. Neurosci. Methods 51:317, 1994); Green, et al., (Cell 28:477, 1982); and Arnheiter, et al., (Nature 294:278, 1981).

Another such method includes the steps of: providing an antibody specific for the gene product of a marker nucleic acid sequence, the gene product being present in cancerous tissue of a given tissue type at a level more or less than the level of the gene product in non-cancerous tissue of the same tissue type; obtaining from a patient a first sample of tissue of the given tissue type, which sample potentially includes cancerous cells; providing a second sample of tissue of the same tissue type (which may be from the same patient or from a normal control, e.g. another individual or cultured cells), this second sample containing normal cells and essentially no cancerous cells; contacting the antibody with protein (which may be partially purified, in lysed but unfractionated cells, or in situ) of the first and second samples under conditions permitting immunocomplex formation between the antibody and the marker nucleic acid sequence product present in the samples; and comparing (a) the amount of immunocomplex formation in the first sample, with (b) the amount of immunocomplex formation in the second sample, wherein a statistically significant difference in the amount of immunocomplex formation in the first sample less as compared to the amount of immunocomplex formation in the second sample is indicative of the presence of cancerous cells in the first sample of tissue.

The subject invention further provides a method of determining whether a cell sample obtained from a subject possesses an abnormal amount of marker polypeptide which comprises (a) obtaining a cell sample from the subject, (b) quantitatively determining the amount of the marker polypeptide in the sample so obtained, and (c) comparing the amount of the marker polypeptide so determined with a known standard, so as to thereby determine whether the cell sample obtained from the subject possesses an abnormal amount of the marker polypeptide. Such marker polypeptides may be detected by immunohistochemical assays, dot-blot assays, ELISA, and the like.

Immunoassays are commonly used to quantitate the levels of proteins in cell samples, and many other immunoassay techniques are known in the art. The invention is not limited to a particular assay procedure, and therefore, is intended to include both homogeneous and heterogeneous procedures. Exemplary immunoassays which may be conducted according to the invention include fluorescence polarization immunoassay (FPIA), fluorescence immunoassay (FIA), enzyme immunoassay (EIA), nephelometric inhibition immunoassay (NIA), enzyme-linked immunosorbent assay (ELISA), and radioimmunoassay (RIA). An indicator moiety, or label group, may be attached to the subject antibodies and is selected so as to meet the needs of various uses of the method which are often dictated by the availability of assay equipment and compatible immunoassay procedures. General techniques to be used in performing the various immunoassays noted above are known to those of ordinary skill in the art.

In another embodiment, the level of the encoded product, or alternatively the level of the polypeptide, in a biological fluid (e.g., blood or urine) of a patient may be determined as a way of monitoring the level of expression of the marker nucleic acid sequence in cells of that patient. Such a method would include the steps of obtaining a sample of a biological fluid from the patient, contacting the sample (or proteins from the sample) with an antibody specific for an encoded marker polypeptide, and determining the amount of immune complex formation by the antibody, with the amount of immune complex formation being indicative of the level of the marker encoded product in the sample. This determination is particularly insructive when compared to the amount of immune complex formation by the same antibody in a control sample taken from a normal individual or in one or more samples previously or subsequently obtained from the same person.

In another embodiment, the method may be used to determine the amount of marker polypeptide present in a cell, which in turn may be correlated with progression of a hyperproliferative disorder. The level of the marker polypeptide may be used predictively to evaluate whether a sample of cells contains cells which are, or are predisposed towards becoming, transformed cells. Moreover, the subject method may be used to assess the phenotype of cells which are known to be transformed, the phenotyping results being useful in planning a particular therapeutic regimen. For example, very high levels of the marker polypeptide in sample cells is a powerful diagnostic and prognostic marker for a cancer. The observation of marker polypeptide levels may be utilized in decisions regarding, for example, the use of more aggressive therapies.

As set out above, one aspect of the present invention relates to diagnostic assays for determining, in the context of cells isolated from a patient, if the level of a marker polypeptide is significantly reduced in the sample cells. The term “significantly reduced” refers to a cell phenotype wherein the cell possesses a reduced cellular amount of the marker polypeptide relative to a normal cell of similar tissue origin. For example, a cell may have less than about 50%, 25%, 10%, or 5% of the marker polypeptide compared to that of a normal control cell. In particular, the assay evaluates the level of marker polypeptide in the test cells, and may compare the measured level with marker polypeptide detected in at least one control cell, for example, a normal cell and/or a transformed cell of known phenotype.

Of particular importance to the subject invention is the ability to quantitate the level of marker polypeptide as determined by the number of cells associated with a normal or abnormal marker polypeptide level. The number of cells with a particular marker polypeptide phenotype may then be correlated with patient prognosis. In one embodiment of the invention, the marker polypeptide phenotype of a lesion is determined as a percentage of cells in a biopsy which are found to have abnormally high/low levels of the marker polypeptide. Such expression may be detected by immunohistochemical assays, dot-blot assays, ELISA, and the like.

Where tissue samples are employed, immunohistochemical staining may be used to determine the number of cells having the marker polypeptide phenotype. For such staining, a multiblock of tissue may be taken from the biopsy or other tissue sample and subjected to proteolytic hydrolysis, employing such agents as protease K or pepsin. In certain embodiments, it may be desirable to isolate a nuclear fraction from the sample cells and detect the level of the marker polypeptide in the nuclear fraction.

The tissue samples are fixed by treatment with a reagent such as formalin, glutaraldehyde, methanol, or the like. The samples are then incubated with an antibody (e.g., a monoclonal antibody) with binding specificity for the marker polypeptides. This antibody may be conjugated to a label for subsequent detection of binding. Samples are incubated for a time sufficient for formation of the immunocomplexes. Binding of the antibody is then detected by virtue of a label conjugated to this antibody. Where the antibody is unlabeled, a second labeled antibody may be employed, for example, which is specific for the isotype of the anti-marker polypeptide antibody. Examples of labels which may be employed include radionuclides, fluorescers, chemiluminescers, enzymes, and the like.

Where enzymes are employed, the substrate for the enzyme may be added to the samples to provide a colored or fluorescent product. Examples of suitable enzymes for use in conjugates include horseradish peroxidase, alkaline phosphatase, malate dehydrogenase, and the like. Where not commercially available, such antibody-enzyme conjugates are readily produced by techniques known to those skilled in the art.

In one embodiment, the assay is performed as a dot blot assay. The dot blot assay finds particular application where tissue samples are employed as it allows determination of the average amount of the marker polypeptide associated with a single cell by correlating the amount of marker polypeptide in a cell-free extract produced from a predetermined number of cells.

It is well established in the cancer literature that tumor cells of the same type (e.g., lung and/or colon tumor cells) may not show uniformly increased expression of individual oncogenes or uniformly decreased expression of individual tumor suppressor genes. There may also be varying levels of expression of a given marker gene even between cells of a given type of cancer, further emphasizing the need for reliance on a battery of tests rather than a single test. Accordingly, in one aspect, the invention provides for a battery of tests utilizing a number of probes of the invention, in order to improve the reliability and/or accuracy of the diagnostic test.

In one embodiment, the present invention also provides a method wherein nucleic acid probes are immobilized on a DNA chip in an organized array. Oligonucleotides may be bound to a solid support by a variety of processes, including lithography. For example, a chip may hold up to 250,000 oligonucleotides. These nucleic acid probes comprise a nucleotide sequence, for example, at least about 12, 15, 25, or 40 nucleotides in length, and up to all or nearly all of a sequence which is complementary to a portion of the coding sequence of a marker nucleic acid sequence and is differentially expressed in tumor cells. The present invention provides significant advantages over the available tests for various cancers, because it increases the reliability of the test by providing an array of nucleic acid markers on a single chip.

The method includes obtaining a biopsy, which is optionally fractionated by cryostat sectioning to enrich tumor cells to about 80% of the total cell population. The DNA or RNA is then extracted, amplified, and analyzed with a DNA chip to determine the presence of absence of the marker nucleic acid sequences.

In one embodiment, the nucleic acid probes are spotted onto a substrate in a two-dimensional matrix or array. Samples of nucleic acids may be labeled and then hybridized to the probes. Double-stranded nucleic acids, comprising the labeled sample nucleic acids bound to probe nucleic acids, may be detected once the unbound portion of the sample is washed away.

The probe nucleic acids may be spotted on substrates including glass, nitrocellulose, etc. The probes can be bound to the substrate by either covalent bonds or by non-specific interactions, such as hydrophobic interactions. The sample nucleic acids can be labeled using radioactive labels, fluorophores, chromophores, etc.

In yet another embodiment, the invention contemplates using a panel of antibodies which are generated against the marker polypeptides of this invention. Such a panel of antibodies may be used as a reliable diagnostic probe for cancer. The assay of the present invention comprises contacting a biopsy sample containing cells, for example, lung cells, with a panel of antibodies to one or more of the encoded products to determine the presence or absence of the marker polypeptides.

The diagnostic methods of the subject invention may also be employed as follow-up to treatment, for example, quantitation of the level of marker polypeptides may be indicative of the effectiveness of current or previously employed cancer therapies as well as the effect of these therapies upon patient prognosis.

In addition, the marker nucleic acids or marker polypeptides may be utilized as part of a diagnostic panel for initial detection, follow-up screening, detection of reoccurrence, and post-treatment monitoring for chemotherapy or surgical treatment.

Accordingly, the present invention makes available diagnostic assays and reagents for detecting gain and/or loss of marker polypeptides from a cell in order to aid in the diagnosis and phenotyping of proliferative disorders arising from, for example, tumorigenic transformation of cells.

The diagnostic assays described above may be adapted to be used as prognostic assays, as well. Such an application takes advantage of the sensitivity of the assays of the invention to events which take place at characteristic stages in the progression of a tumor. For example, a given marker gene may be up- or down-regulated at a very early stage, perhaps before the cell is irreversibly committed to developing into a malignancy, while another marker gene may be characteristically up- or down-regulated only at a much later stage. Such a method could involve the steps of contacting the mRNA of a test cell with a nucleic acid probe derived from a given marker nucleic acid which is expressed at different characteristic levels in cancerous or precancerous cells at different stages of tumor progression, and determining the approximate amount of hybridization of the probe to the mRNA of the cell, such amount being an indication of the level of expression of the gene in the cell, and thus an indication of the stage of tumor progression of the cell; alternatively, the assay may be carried out with an antibody specific for the gene product of the given marker nucleic acid, contacted with the proteins of the test cell. A battery of such tests will disclose not only the existence and location of a tumor, but also will allow the clinician to select the mode of treatment most appropriate for the tumor, and to predict the likelihood of success of that treatment.

The methods of the invention may also be used to follow the clinical course of a tumor. For example, the assay of the invention may be applied to a tissue sample from a patient; following treatment of the patient for the cancer, another tissue sample is taken and the test repeated. Successful treatment will result in either removal of all cells which demonstrate differential expression characteristic of the cancerous or precancerous cells, or a substantial increase in expression of the gene in those cells, perhaps approaching or even surpassing normal levels.

In yet another embodiment, the invention provides methods for determining whether a subject is at risk for developing a disease, such as a predisposition to develop cancer, associated with aberrant activity of a polypeptide, wherein the aberrant activity of the polypeptide is characterized by detecting the presence or absence of a genetic lesion characterized by at least one of (a) an alteration affecting the integrity of a gene encoding a marker polypeptides, or (b) the mis-expression of the encoding nucleic acid. To illustrate, such genetic lesions may be detected by ascertaining the existence of at least one of (i) a deletion of one or more nucleotides from the nucleic acid sequence, (ii) an addition of one or more nucleotides to the nucleic acid sequence, (iii) a substitution of one or more nucleotides of the nucleic acid sequence, (iv) a gross chromosomal rearrangement of the nucleic acid sequence, (v) a gross alteration in the level of a messenger RNA transcript of the nucleic acid sequence, (vi) aberrant modification of the nucleic acid sequence, such as of the methylation pattern of the genomic DNA, (vii) the presence of a non-wild type splicing pattern of a messenger RNA transcript of the gene, (viii) a non-wild type level of the marker polypeptide, (ix) allelic loss of the gene, and/or (x) inappropriate post-translational modification of the marker polypeptide.

The present invention provides assay techniques for detecting lesions in the encoding nucleic acid sequence. These methods include, but are not limited to, methods involving sequence analysis, Southern blot hybridization, restriction enzyme site mapping, and methods involving detection of absence of nucleotide pairing between the nucleic acid to be analyzed and a probe.

Specific diseases or disorders, for example, genetic diseases or disorders, are associated with specific allelic variants of polymorphic regions of certain genes, which do not necessarily encode a mutated protein. Thus, the presence of a specific allelic variant of a polymorphic region of a gene in a subject may render the subject susceptible to developing a specific disease or disorder. Polymorphic regions in genes, may be identified, by determining the nucleotide sequence of genes in populations of individuals. If a polymorphic region is identified, then the link with a specific disease may be determined by studying specific populations of individuals, for example, individuals which developed a specific disease, such as cancer. A polymorphic region may be located in any region of a gene, for example, exons, in coding or non-coding regions of exons, introns, and promoter region.

In an exemplary embodiment, there is provided a nucleic acid composition comprising a nucleic acid probe including a region of nucleotide sequence which is capable of hybridizing to a sense or antisense sequence of a gene or naturally occurring mutants thereof, or 5′ or 3′ flanking sequences or intronic sequences naturally associated with the subject genes or naturally occurring mutants thereof. The nucleic acid of a cell is rendered accessible for hybridization, the probe is contacted with the nucleic acid of the sample, and the hybridization of the probe to the sample nucleic acid is detected. Such techniques may be used to detect lesions or allelic variants at either the genomic or mRNA level, including deletions, substitutions, etc., as well as to determine mRNA transcript levels.

An example of a detection method is allele specific hybridization using probes overlapping the mutation or polymorphic site and having about 5, 10, 20, 25, or 30 nucleotides around the mutation or polymorphic region. In one embodiment of the invention, several probes capable of hybridizing specifically to allelic variants are attached to a solid phase support, for example, a “chip.” Mutation detection analysis using these chips comprising oligonucleotides, also termed “DNA probe arrays” is described, for example, by Cronin, et al., (Human Mutation 7:244, 1996). In one embodiment, a chip may comprise all the allelic variants of at least one polymorphic region of a gene. The solid phase support is then contacted with a test nucleic acid and hybridization to the specific probes is detected. Accordingly, the identity of numerous allelic variants of one or more genes may be identified in a simple hybridization experiment.

In certain embodiments, detection of the lesion comprises utilizing the probe/primer in a polymerase chain reaction (PCR) (see, e.g., U.S. Pat. Nos. 4,683,195 and 4,683,202), such as anchor PCR or RACE PCR, or, alternatively, in a ligase chain reaction (LCR) (see, e.g., Landegran, et al., Science 241:1077-1080, 1988; Nakazaw, et al., Proc. Natl. Acad. Sci. USA 91:360-364, 1994), the latter of which can be particularly useful for detecting point mutations in the gene (see, e.g., Abravaya, et al., Nuc. Acid Res. 23:675-682, 1995). In an illustrative embodiment, the method includes the steps of (i) collecting a sample of cells from a patient, (ii) isolating nucleic acid (e.g., genomic, mRNA, or both) from the cells of the sample, (iii) contacting the nucleic acid sample with one or more primers which specifically hybridize to a nucleic acid sequence under conditions such that hybridization and amplification of the nucleic acid (if present) occurs, and (iv) detecting the presence or absence of an amplification product, or detecting the size of the amplification product and comparing the length to a control sample. It is anticipated that PCR and/or LCR may be desirable to use as a preliminary amplification step in conjunction with any of the techniques used for detecting mutations described herein.

The invention thus, also encompasses methods of screening for agents which inhibit or enhance the expression of the nucleic acid markers in vitro, comprising exposing a cell or tissue in which the marker nucleic acid mRNA is detectable in cultured cells to an agent in order to determine whether the agent is capable of inhibiting or enhancing production of the mRNA; and determining the level of mRNA in the exposed cells or tissue, wherein a decrease in the level of the mRNA after exposure of the cell line to the agent is indicative of inhibition of the marker nucleic acid mRNA production and an increase in mRNA levels is indicative of enhancement of maker mRNA production.

Alternatively, the screening method may include in vitro screening of a cell or tissue in which marker protein is detectable in cultured cells to an agent suspected of inhibiting or enhancing production of the marker protein; and determining the level of the marker protein in the cells or tissue, wherein a decrease in the level of marker protein after exposure of the cells or tissue to the agent is indicative of inhibition of marker protein production and an increase on the level of marker protein is indicative of enhancement of marker protein production.

The invention also encompasses in vivo methods of screening for agents which inhibit or enhance expression of the marker nucleic acids, comprising exposing a subject having tumor cells in which marker mRNA or protein is detectable to an agent suspected of inhibiting or enhancing production of marker mRNA or protein; and determining the level of marker mRNA or protein in tumor cells of the exposed mammal. A decrease in the level of marker mRNA or protein after exposure of the subject to the agent is indicative of inhibition of marker nucleic acid expression and an increase in the level of marker mRNA or protein is indicative of enhancement of marker nucleic acid expression.

Accordingly, the invention provides a method comprising incubating a cell expressing the marker nucleic acids with a test compound and measuring the mRNA or protein level. The invention further provides a method for quantitatively determining the level of expression of the marker nucleic acids in a cell population, and a method for determining whether an agent is capable of increasing or decreasing the level of expression of the marker nucleic acids in a cell population. The method for determining whether an agent is capable of increasing or decreasing the level of expression of the marker nucleic acids in a cell population comprises the steps of (a) preparing cell extracts from control and agent-treated cell populations, (b) isolating the marker polypeptides from the cell extracts, and (c) quantifying (e.g., in parallel) the amount of an immunocomplex formed between the marker polypeptide and an antibody specific to said polypeptide. The marker polypeptides of this invention may also be quantified by assaying for its bioactivity. Agents that induce an increase in the marker nucleic acid expression may be identified by their ability to increase the amount of immunocomplex formed in the treated cell as compared with the amount of the immunocomplex formed in the control cell. In a similar manner, agents that decrease expression of the marker nucleic acid may be identified by their ability to decrease the amount of the immunocomplex formed in the treated cell extract as compared to the control cell.

Predictive Assays

Laboratory-based assays, which can predict clinical benefit from a given anti-cancer agent, will greatly enhance the clinical management of patients with cancer. In order to assess this effect, a biomarker associated with the anti-cancer agent may be analyzed in a biological sample (e.g., tumor sample, plasma) before, during, and following treatment.

Another approach to monitor treatment is an evaluation of serum proteomic spectra. Specifically, plasma samples may be subjected to mass spectroscopy (e.g., surface-enhanced laser desorption and ionization) and a proteomic spectra may be generated for each patient. A set of spectra, derived from analysis of plasma from patients before and during treatment, may be analyzed by an iterative searching algorithm, which can identify a proteomic pattern that completely discriminates the treated samples from the untreated samples. The resulting pattern may then be used to predict the clinical benefit following treatment.

Global gene expression profiling of biological samples (e.g., tumor biopsy samples, blood samples) and bioinformatics-driven pattern identification may be utilized to predict clinical benefit and sensitivity, as well as development of resistance to an anti-cancer agent. For example, RNA isolated from cells derived from whole blood from patients before and during treatment may be used to generate blood cell gene expression profiles utilizing Affymetrix GeneChip technology and algorithms. These gene expression profiles may then predict the clinical benefit from treatment with a particular anti-cancer agent.

Analysis of the biochemical composition of urine by 1D ¹H-NMR (Nuclear Magnetic Resonance) may also be utilized as a predictive assay. Pattern recognition techniques may be used to evaluate the metabolic response to treatment with an anti-cancer agent and to correlate this response with clinical endpoints. The biochemical or endogenous metabolites excreted in urine have been well-characterized by proton NMR for normal subjects (Zuppi, et al., Clin Chim Acta 265:85-97, 1997). These metabolites (approximately 3040) represent the by-products of the major metabolic pathways, such as the citric acid and urea cycles. Drug-, disease-, and genetic-stimuli have been shown to produce metabolic-specific changes in baseline urine profiles that are indicative of the timeline and magnitude of the metabolic response to the stimuli. These analyses are multi-variant and therefore use pattern recognition techniques to improve data interpretation. Urinary metabolic profiles may be correlated with clinical endpoints to determine the clinical benefit.

Kits

The invention further provides kits for determining the expression level of genes characteristic of human lung tumor tissue. The kits may be useful for identifying subjects that are predisposed to developing cancer or who have cancer, as well as for identifying and validating therapeutics for cancer. In one embodiment, the kit comprises a computer readable medium on which is stored one or more gene expression profile of diseased cells of cancer, or at least values representing levels of expression of one or more genes characteristic of human lung tumor tissue in a diseased cell. The computer readable medium can also comprise gene expression profiles of counterpart normal cells, diseased cells treated with a drug, and any other gene expression profile described herein. The kit can comprise expression profile analysis software capable of being loaded into the memory of a computer system.

A kit can comprise a microarray comprising probes of genes characteristic of human lung tumor tissue. A kit can comprise one or more probes or primers for detecting the expression level of one or more genes characteristic of human lung tumor tissue and/or a solid support on which probes attached and which can be used for detecting expression of one or more genes characteristic of human lung tumor tissue in a sample. A kit may further comprise nucleic acid controls, buffers, and instructions for use.

Other kits provide compositions for treating cancer. For example, a kit can also comprise one or more nucleic acids corresponding to one or more genes characteristic of human lung tumor tissue (e.g., for use in treating a patient having cancer). The nucleic acids can be included in a plasmid or a vector (e.g., a viral vector). Other kits comprise a polypeptide encoded by a gene characteristic of cancer or an antibody to a polypeptide. Yet other kits comprise compounds identified herein as agonists or antagonists of genes characteristic of human lung tumor tissue. The compositions may be pharmaceutical compositions comprising a pharmaceutically acceptable excipient.

EXAMPLES

It will be apparent to those skilled in the art that the examples and embodiments described herein are by way of illustration and not of limitation, and that other examples may be used without departing from the spirit and scope of the present invention, as set forth in the claims.

Example 1. Gene Expression Profiling Protocol

A. Tissue Source

Human lung tumor tissue and normal adjacent tissue were purchased from the National Disease Research Institute.

B. RNA Extraction and cRNA Preparation

Total RNA was extracted from the human tissues using TRIzol reagent (Life Technologies, MD) according to a modified vendor protocol which utilizes the RNeasy protocol (Qiagen, Calif.). After homogenization with a Brinkmann Polytron PT10/35 (Brinkmann, Switzerland) and phase separation with chloroform, samples were applied to RNeasy columns. RNA samples were treated with DNase I using RNase-free DNase Set (Qiagen, Calif.).

After elution and quantitation with UV spectrophotometry, each sample was reverse transcribed into double-stranded cDNA using the Gibco Superscript II Choice System for RT-PCR according to vendor protocol (Invitrogen, Calif.).

Samples were organically extracted and ethanol precipitated. Approximately 1 μg cDNA was then used in an in vitro transcription reaction incorporating biotinylated nucleotides using an RNA labeling kit Enzo Diagnostics, NY). The resulting cRNA was put through RNeasy clean-up protocol and then quantified using UV spectrophotometry. The cRNA (15 μg) was fragmented in the presence of MgOAc and KOAc at 94° C. Fragmented RNA (10 μg) was loaded onto each array, one cRNA sample per array. Arrays were hybridized for 16 hours at 45° C. rotating at 60 rpm in an Affymetrix GeneChip Hybridization Oven 640.

C. Microarray Suite 5.0 Analysis

Following hybridization, arrays were stained with Phycoerythrin-conjugated Streptavidin, placed in an Agilent GeneArray Scanner and then exposed to a 488 nm laser, causing excitation of the phycoerythrin. The Microarray Suite 5.0 software digitally converts the intensity of light given off by the array into a numeric value indicative of levels of gene expression. Because each array represents a single sample, tumor RNA was compared to the RNA isolated from normal adjacent tissue.

D. Spotfire Analysis

The goal is to generate sets of markers to distinguish between lung cancer and normal tissues. This marker set represents a set of probe sets that is an optimum set for the prediction of whether or not a lung tissue is cancerous using a support vector machine. The optimal set is determined to be the one that shows the greatest prediction accuracy with the least error. This marker set was derived using the following method:

-   -   1. The data was imported into Spotfire.     -   2. A Treatment Comparison between cancer and normal tissues was         performed using the t-test option.     -   3. The following criteria were used to select the probe sets:         -   a. The data showed that the probe sets were all not             “Absent,” as determined by the Affymetrix Microarray Suite             software v. 5.0 (Affymetrix, Santa Clara, Calif.) for either             all of the normal or all of the cancer samples         -   b. The data for the probe set showed a p-value of less than             or equal to 0.001 according to the t-test.     -   4. AU probe sets not meeting these criteria were eliminated from         further analysis.     -   5. The remaining data was used in a selection process using         custom software in conjunction with a modified version of the         svm-train program (C++ version) which is part of LIBSVM (Chang         and Lin, LIBSVM: A Library for Support Vector Machines, 2001.         Software available from         http://www.csie.ntu.edu.tw/˜cjlin/libsvm). The custom software         was written in the Perl language v. 5.004. The software was run         on an SGI Origin 2000 running the IRIX 6.5.7f operation system.         This software was used in the following manner:         -   a. The Perl program was used as a “wrapper” to control             svm-train. Its functions were to select subsets of the data             and feed these sets to svm-train for training support vector             machines (SVM).         -   b. Training consisted of many elimination rounds. During             each round many support vector machines were trained using             ten fold cross validation in order to estimate accuracy and             error. Each SVM was trained on all data except that the data             from one probe set was left out. One probe set was             eliminated from the data set at each round. This was the             probe set that showed the best error and/or accuracy for the             SVM when it was eliminated.         -   c. Training continued until there was only one probe set             left.         -   d. The set of probe sets that showed the greatest accuracy             with the least error was selected and is shown in Table 1.         -   e. The input arguments to svm-train were -s0-t0-c1-v 10

Marker Set Two (Table 2) represents a set of probe sets that is an optimum set for the prediction of whether or not a lung tissue is cancerous using a support vector machine. The optimal set is determined to be the one that shows the greatest prediction accuracy with the least error. This marker set was derived using the method described for Marker Set 1 with the following exceptions:

-   -   1. The data set was not limited to those probe sets that showed         a t-test p-value of less than or equal to 0.001.     -   2. Five percent of the probe sets were eliminated at each round         until 1000 probe sets remained. Then, only one probe set was         eliminated during each round.

Marker Set Three (Table 3) represents a set of probe sets that is an optimum set for the prediction of whether or not a lung tissue is cancerous using a support vector machine. The optimal set is determined to be the one that shows the greatest prediction accuracy. This marker set is a subset of the set listed in Table 2. This subset had the same level of accuracy (100%) with an error that was only 0.448% greater than the set listed in Table 2.

All three marker sets could select which lung tissues were cancerous and which were normal with 100% accuracy using their respective methods as determined by ten fold cross validation. TABLE 1 Lung Tumor Marker (Set One) SEQ SEQ ID NO ID NO Gene Genbank Unigene (DNA) (Protein) Probe Set Symbol Title Accession Cluster 1 2 205819_at MARCO macrophage receptor with collagenous structure NM_006770.1 Hs.67726 3 4 201935_s_at EIF4G3 eukaryotic translation initiation factor 4 gamma, 3 AI768122 Hs.25732 5 6 200694_s_at DDX24 DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 24 NM_020414.2 Hs.155986 7 8 212311_at KIAA0746 KIAA0746 protein AB018289.1 Hs.49500 9 10 207843_x_at CYB5 cytochrome b-5 NM_001914.1 Hs.83834 11 12 200966_x_at ALDOA aldolase A, fructose-bisphosphate NM_000034.1 Hs.273415 13 14 48031_r_at C5orf4 chromosome 5 open reading frame 4 H93077 Hs.10235 15 16 206200_s_at ANXA11 annexin A11 NM_001157.1 Hs.75510 17 18 205812_s_at SULT1C2 sulfotransferase family, cytosolic, 1C, member 2 NM_006588.1 Hs.312644 19 20 210616_s_at KIAA0905 yeast Sec31p homolog AB020712.1 Hs.70266 21 22 200672_x_at SPTBN1 spectrin, beta, non-erythrocytic 1 NM_003128.1 Hs.107164 23 24 215838_at ILT11 leukocyte immunoglobulin-like receptor, subfamily B (with TM AF212842.1 Hs.375022 and ITIM domains), member 7 25 26 212581_x_at GAPD glyceraldehyde-3-phosphate dehydrogenase BE561479 Hs.169476 27 28 55662_at FLJ13114 hypothetical protein FLJ13114 H27225 Hs.9444 29 30 209124_at MYD88 myeloid differentiation primary response gene (88) U70451.1 Hs.82116 31 32 221031_s_at DKFZP434F0318 hypothetical protein DKFZp434F0318 NM_030817.1 Hs.23388 33 34 210299_s_at FHL1 four and a half LIM domains 1 AF063002.1 Hs.239069 35 36 220367_s_at FLJ12761 hypothetical protein FLJ12761 NM_024545.1 Hs.10554 37 38 201535_at UBL3 ubiquitin-like 3 NM_007106.1 Hs.173091 39 40 200843_s_at EPRS glutamyl-prolyl-tRNA synthetase NM_004446.1 Hs.55921 41 42 217294_s_at ENO1 enolase 1, (alpha) U88968.1 Hs.381173 43 44 217234_s_at VIL2 villin 2 (ezrin) AF199015.1 Hs.155191 45 46 208881_x_at IDI1 isopentenyl-diphosphate delta isomerase BC005247.1 Hs.76038 47 48 202310_s_at COL1A1 collagen, type I, alpha 1 NM_000088.1 Hs.172928 49 50 213438_at Homo sapiens cDNA FLJ34019 fis, clone FCBBF2002898 AA995925 Hs.7309 51 52 202857_at TMEM4 transmembrane protein 4 NM_014255.1 Hs.8752 53 54 219282_s_at TRPV2 transient receptor potential cation channel, subfamily V, NM_015930.1 Hs.279746 member 2 55 56 201432_at CAT catalase NM_001752.1 Hs.76359 57 58 202150_s_at HEF1 enhancer of filamentation 1 (cas-like docking; Crk-associated U64317.1 Hs.80261 substrate related) 59 60 207265_s_at KDELR3 KDEL (Lys-Asp-Glu-Leu) endoplasmic reticulum protein NM_016657.1 Hs.250696 retention receptor 3 61 62 208884_s_at DD5 progestin induced protein AF006010.1 Hs.278428 63 64 216037_x_at TCF7L2 transcription factor 7-like 2 (T-cell specific, HMG-box) AA664011 Hs.348412 65 66 214752_x_at FLNA filamin A, alpha (actin binding protein 280) AI625550 Hs.195464 67 68 203883_s_at Rab11-FIP2 KIAA0941 protein BG249608 Hs.173656 69 70 205382_s_at DF D component of complement (adipsin) NM_001928.1 Hs.155597 71 72 204063_s_at ULK2 unc-51-like kinase 2 (C. elegans) NM_014683.1 Hs.151406 73 352 204661_at CDW52 CDW52 antigen (CAMPATH-1 antigen) NM_001803.1 Hs.276770 74 75 210950_s_at FDFT1 farnesyl-diphosphate farnesyltransferase 1 BC003573.1 Hs.48876 76 77 202085_at TJP2 tight junction protein 2 (zona occludens 2) NM_004817.1 Hs.75608 78 79 201997_s_at SHARP SMART/HDAC1 associated repressor protein NM_015001.1 Hs.184245 80 81 221896_s_at DKFZP564K247 DKFZP564K247 protein BE739519 Hs.7917 82 83 206283_s_at TAL1 T-cell acute lymphocytic leukemia 1 NM_003189.1 Hs.73828 84 85 203186_s_at S100A4 S100 calcium binding protein A4 (calcium protein, calvasculin, NM_002961.2 Hs.81256 metastasin, murine placental homolog) 86 87 213056_at KIAA1013 KIAA1013 protein AU145019 Hs.96427 88 89 208622_s_at VIL2 villin 2 (ezrin) NM_003379.2 Hs.155191 90 91 205357_s_at AGTR1 angiotensin II receptor, type 1 NM_000685.2 Hs.89472 92 93 214366_s_at ALOX5 arachidonate 5-lipoxygenase AA995910 Hs.89499 94 95 214135_at CLDN18 claudin 18 BE551219 Hs.278966

TABLE 2 Lung Tumor Marker (Set Two) SEQ SEQ ID NO ID NO Genbank Unigene (DNA) (Protein) Probe Set Gene Symbol Title Accession Cluster 96 97 219065_s_at LOC51072 C21 orf19-like protein NM_015955.1 Hs.20814 98 99 213627_at Homo sapiens cDNA FLJ33684 fis, clone BRAWH2002630, AI924630 Hs.376719 highly similar to Human hepatocellular carcinoma associated protein (JCL-1) mRNA 100 101 216438_s_at AL133228 102 103 209553_at KIAA0804 KIAA0804 protein BC001001.2 Hs.7316 104 105 202911_at MSH6 mutS homolog 6 (E. coli) NM_000179.1 Hs.3248 106 107 214167_s_at RPLP0 ribosomal protein, large, P0 AA555113 Hs.350108 108 109 202690_s_at SNRPD1 small nuclear ribonucleoprotein D1 polypeptide 16 kDa BC001721.1 Hs.86948 110 111 202345_s_at FABP5 fatty acid binding protein 5 (psoriasis-associated) NM_001444.1 Hs.153179 112 113 34031_i_at CCM1 cerebral cavernous malformations 1 U90268 Hs.93810 114 115 212944_at Homo sapiens cDNA: FLJ21243 fis, clone COL01164 AK024896.1 Hs.268016 116 117 217806_s_at DKFZP586F1524 DKFZP586F1524 protein NM_015584.1 Hs.241543 118 119 210849_s_at VPS41 vacuolar protein sorting 41 (yeast) AF135593.1 Hs.180941 120 121 213491_x_at RPN2 ribophorin II AL514285 Hs.75722 122 123 200999_s_at CKAP4 cytoskeleton-associated protein 4 NM_006825.1 Hs.74368 124 125 209222_s_at OSBPL2 oxysterol binding protein-like 2 BC000296.1 Hs.15519 126 127 215121_x_at IGL immunoglobulin lambda locus AA680302 Hs.181125 128 129 204675_at SRD5A1 steroid-5-alpha-reductase, alpha polypeptide 1 (3-oxo-5 NM_001047.1 Hs.552 alpha-steroid delta 4-dehydrogenase alpha 1) 130 131 205786_s_at ITGAM integrin, alpha M (complement component receptor 3, alpha; NM_000632.2 Hs.172631 also known as CD11b (p170), macrophage antigen alpha polypeptide) 132 133 205133_s_at HSPE1 heat shock 10 kDa protein 1 (chaperonin 10) NM_002157.1 Hs.1197 134 135 201381_x_at SIP Siah-interacting protein AF057356.1 Hs.27258 136 137 215176_x_at IGKC immunoglobulin kappa constant AW404894 Hs.156110 138 139 211714_x_at DKFZp434N0650 hypothetical protein DKFZp434N0650 BC005838.1 Hs.179661 140 141 200650_s_at LDHA lactate dehydrogenase A NM_005566.1 Hs.2795 142 143 209433_s_at PPAT phosphoribosyl pyrophosphate amidotransferase AI457120 Hs.311 144 145 205194_at PSPH phosphoserine phosphatase NM_004577.1 Hs.56407 146 147 219654_at PTPLA protein tyrosine phosphatase-like (proline instead of NM_014241.1 Hs.114062 catalytic arginine), member a 148 149 201496_x_at MYH11 myosin, heavy polypeptide 11, smooth muscle AI889739 Hs.78344 150 151 214039_s_at LC27 putative integral membrane transporter T15777 Hs.296398 152 153 212137_at MGC19556 hypothetical protein MGC19556 AV746402 Hs.334787 154 155 212174_at AK2 adenylate kinase 2 AK023758.1 Hs.171811 156 157 209170_s_at GPM6B glycoprotein M6B AF016004.1 Hs.5422 158 159 218397_at FLJ10335 hypothetical protein FLJ10335 NM_018062.1 Hs.279841 160 161 203738_at FLJ11193 hypothetical protein FLJ11193 AI421192 Hs.151046 162 163 202998_s_at LOXL2 lysyl oxidase-like 2 NM_002318.1 Hs.83354 164 165 201021_s_at DSTN destrin (actin depolymerizing factor) BF697964 Hs.82306 166 167 209354_at TNFRSF14 tumor necrosis factor receptor superfamily, BC002794.1 Hs.279899 member 14 (herpesvirus entry mediator) 168 169 204269_at PIM2 pim-2 oncogene NM_006875.1 Hs.80205 170 171 200976_s_at TAX1BP1 Tax1 (human T-cell leukemia virus type I) binding protein 1 NM_006024.2 Hs.5437 172 173 218883_s_at FLJ23468 hypothetical protein FLJ23468 NM_024629.1 Hs.38178 174 175 201425_at ALDH2 aldehyde dehydrogenase 2 family (mitochondrial) NM_000690.1 Hs.195432 176 177 212489_at COL5A1 collagen, type V, alpha 1 AI983428 Hs.146428 178 179 201422_at IFI30 interferon, gamma-inducible protein 30 NM_006332.1 Hs.14623 180 181 205273_s_at MP1 metalloprotease 1 (pitrilysin family) NM_014968.1 Hs.260116 182 183 203854_at IF I factor (complement) NM_000204.1 Hs.36602 184 185 201059_at EMS1 ems1 sequence (mammary tumor and squamous cell NM_005231.1 Hs.119257 carcinoma-associated (p80/85 src substrate) 186 187 203233_at IL4R interleukin 4 receptor NM_000418.1 Hs.75545 188 189 213365_at KIAA1504 KIAA1504 protein N64622 Hs.157426 190 191 202722_s_at GFPT1 glutamine-fructose-6-phosphate transaminase 1 NM_002056.1 Hs.1674 192 193 220032_at FLJ21986 hypothetical protein FLJ21986 NM_024913.1 Hs.255416 194 195 202403_s_at COL1A2 collagen, type I, alpha 2 AA788711 Hs.179573 196 197 217871_s_at MIF macrophage migration inhibitory factor NM_002415.1 Hs.73798 (glycosylation-inhibiting factor) 198 199 205173_x_at CD58 CD58 antigen, (lymphocyte function-associated antigen 3) NM_001779.1 Hs.75626 200 201 207812_s_at GORASP2 golgi reassembly stacking protein 2, 55 kDa NM_015530.1 Hs.6880 202 203 218003_s_at FKBP3 FK506 binding protein 3, 25 kDa NM_002013.1 Hs.350402 204 205 201672_s_at USP14 ubiquitin specific protease 14 (tRNA-guanine transglycosylase) NM_005151.1 Hs.75981 206 207 201264_at COPE coatomer protein complex, subunit epsilon NM_007263.1 Hs.10326 208 209 208623_s_at VIL2 villin 2 (ezrin) J05021.1 Hs.155191 210 211 202243_s_at PSMB4 proteasome (prosome, macropain) subunit, beta type, 4 NM_002796.1 Hs.89545 212 213 208948_s_at STAU staufen, RNA binding protein (Drosophila) BC000830.1 Hs.6113 214 215 220044_x_at LUC7A cisplatin resistance-associated overexpressed protein NM_016424.1 Hs.3688 216 217 205873_at PIGL phosphatidylinositol glycan, class L NM_004278.1 Hs.27008 218 219 217791_s_at PYCS pyrroline-5-carboxylate synthetase NM_002860.1 Hs.114366 (glutamate gamma-semialdehyde synthetase) 220 221 212865_s_at COL14A1 collagen, type XIV, alpha 1 (undulin) BF449063 Hs.36131 222 223 212237_at KIAA0978 KIAA0978 protein AL117518.1 Hs.3686 224 225 209969_s_at STAT1 signal transducer and activator of transcription 1, 91 kDa BC002704.1 Hs.21486 226 227 213693_s_at MUC1 mucin 1, transmembrane AI610869 Hs.89603 228 229 201973_s_at LOC51622 CGI-43 protein AL550875 Hs.289112 230 231 202948_at IL1R1 interleukin 1 receptor, type I NM_000877.1 Hs.82112 232 233 204619_s_at CSPG2 chondroitin sulfate proteoglycan 2 (versican) BF590263 Hs.81800 234 235 204017_at KDELR3 KDEL (Lys-Asp-Glu-Leu) endoplasmic reticulum protein NM_006855.2 Hs.250696 retention receptor 3 236 237 201231_s_at ENO1 enolase 1, (alpha) NM_001428.1 Hs.254105 238 239 200699_at ESTs, Weakly similar to A43932 mucin 2 precursor, NM_006854.2 Hs.380986 intestinal - human (fragments) [H. sapiens] 240 241 200594_x_at HNRPU heterogeneous nuclear ribonucleoprotein U NM_004501.1 Hs.103804 (scaffold attachment factor A) 242 243 218194_at DKFZP566E144 small fragment nuclease NM_015523.1 Hs.7527 244 245 212842_x_at RANBP2L1 RAN binding protein 2-like 1 AL043571 Hs.179825 246 247 216640_s_at P5 protein disulfide isomerase-related protein AK026926.1 Hs.182429 248 249 212041_at ATP6V0D1 ATPase, H+ transporting, lysosomal 38 kDa, V0 AL566172 Hs.106876 subunit d isoform 1 250 251 218167_at LOC51321 hypothetical protein LOC51321 NM_016627.1 Hs.268122 252 253 209274_s_at MGC4276 hypothetical protein MGC4276 similar to CG8198 BC002675.1 Hs.177776 254 255 208742_s_at SAP18 sin3-associated polypeptide, 18 kDa U78303.1 Hs.23964 256 257 205034_at CCNE2 cyclin E2 NM_004702.1 Hs.30464 258 259 218795_at ACP6 lysophosphatidic acid phosphatase NM_016361.1 Hs.15871 260 261 219382_at RBT1 RPA-binding trans-activator NM_013368.1 Hs.169138 262 263 221865_at FLJ38045 hypothetical protein FLJ38045 BF969986 Hs.170226 264 265 209345_s_at PI4KII phosphatidylinositol 4-kinase type II AL561930 Hs.25300 266 267 207543_s_at P4HA1 procollagen-proline, 2-oxoglutarate 4-dioxygenase NM_000917.1 Hs.76768 (proline 4-hydroxylase), alpha polypeptide I 268 269 201193_at IDH1 isocitrate dehydrogenase 1 (NADP+), soluble NM_005896.1 Hs.11223 270 271 216041_x_at GRN granulin AK023348.1 Hs.180577 272 273 208843_s_at GORASP2 golgi reassembly stacking protein 2, 55 kDa BC001408.1 Hs.6880 274 275 200749_at RAN RAN, member RAS oncogene family BF112006 Hs.10842 276 277 60471_at RIN3 Ras and Rab interactor 3 AA625133 Hs.180040 278 279 221485_at B4GALT5 UDP-Gal: betaGlcNAc beta NM_004776.1 Hs.107526 1,4-galactosyltransferase, polypeptide 5 280 281 214002_at MYL6 myosin, light polypeptide 6, AA419227 Hs.77385 alkali, smooth muscle and non-muscle 282 283 218723_s_at RGC32 RGC32 protein NM_014059.1 Hs.76640 284 285 214669_x_at Homo sapiens isolate donor N clone N168K BG485135 Hs.306357 immunoglobulin kappa light chain variable region mRNA, partial cds 286 287 218218_at DIP13B DIP13 beta NM_018171.1 Hs.107882 288 289 217949_s_at IMAGE3455200 hypothetical protein IMAGE3455200 NM_024006.1 Hs.324844 290 291 203710_at ITPR1 inositol 1,4,5-triphosphate receptor, type 1 NM_002222.1 Hs.198443 292 293 220661_s_at FLJ20531 hypothetical protein FLJ20531 NM_017865.1 Hs.23617 294 295 203458_at SPR sepiapterin reductase (7,8-dihydrobiopterin: AI951454 Hs.301540 NADP+ oxidoreductase) 296 297 216207_x_at IGKC immunoglobulin kappa constant AW408194 Hs.156110 298 299 201377_at KIAA0144 KIAA0144 gene product NM_014847.1 Hs.8127 300 301 208073_x_at TTC3 tetratricopeptide repeat domain 3 NM_003316.1 Hs.118174 71 72 204063_s_at ULK2 unc-51-like kinase 2 (C. elegans) NM_014683.1 Hs.151406 302 303 212160_at ESTs, Highly similar to XPOT_HUMAN Exportin T AI984005 Hs.380785 (tRNA exportin) (Exportin(tRNA)) [H. sapiens] 304 305 218205_s_at MKNK2 MAP kinase-interacting serine/threonine kinase 2 NM_017572.1 Hs.261828 306 307 204781_s_at TNFRSF6 tumor necrosis factor receptor superfamily, member 6 NM_000043.1 Hs.82359 308 309 202404_s_at COL1A2 collagen, type I, alpha 2 NM_000089.1 Hs.179573 310 311 204615_x_at IDI1 isopentenyl-diphosphate delta isomerase NM_004508.1 Hs.76038 312 313 200068_s_at CANX calnexin M94859.1 Hs.155560 314 315 212595_s_at DAZAP2 DAZ associated protein 2 AL534321 Hs.75416 316 317 219458_s_at FLJ22609 hypothetical protein FLJ22609 NM_022072.1 Hs.18740 318 319 204787_at Z39IG Ig superfamily protein NM_007268.1 Hs.8904 320 321 202034_x_at RB1CC1 RB1-inducible coiled-coil 1 NM_014781.1 Hs.50421 322 323 201178_at FBXO7 F-box only protein 7 NM_012179.1 Hs.5912 324 325 214687_x_at ALDOA aldolase A, fructose-bisphosphate AK026577.1 Hs.273415 33 34 210299_s_at FHL1 four and a half LIM domains 1 AF063002.1 Hs.239069 326 327 202146_at IFRD1 interferon-related developmental regulator 1 AA747426 Hs.7879 328 329 218241_at GOLGA5 golgi autoantigen, golgin subfamily a, 5 NM_005113.1 Hs.241572 330 331 221004_s_at ITM3 integral membrane protein 3 NM_030926.1 Hs.111577 332 333 201160_s_at CSDA cold shock domain protein A AL556190 Hs.198726 334 335 211139_s_at NAB1 NGFI-A binding protein 1 (EGR1 binding protein 1) AF045452.1 Hs.107474 336 337 214211_at FTH1 ferritin, heavy polypeptide 1 AA083483 Hs.62954 93 94 214366_s_at ALOX5 arachidonate 5-lipoxygenase AA995910 Hs.89499 338 339 204271_s_at EDNRB endothelin receptor type B M74921.1 Hs.82002 3 4 200966_x_at ALDOA aldolase A, fructose-bisphosphate NM_000034.1 Hs.273415 340 341 200711_s_at TCEB1L transcription elongation factor B (SIII), polypeptide 1-like NM_003197.2 Hs.171626 342 343 218107_at FLJ21016 hypothetical protein FLJ21016 NM_025160.1 Hs.289069 344 345 212433_x_at RPS2 ribosomal protein S2 AA630314 Hs.356360 346 347 202953_at C1QB complement component 1, q subcomponent, beta polypeptide NM_000491.2 Hs.8986 348 349 219117_s_at FKBP11 FK506 binding protein 11, 19 kDa NM_016594.1 Hs.24048 350 351 221704_s_at FLJ12750 hypothetical protein FLJ12750 BC005882.1 Hs.77870 353 354 202375_at SEC24D SEC24 related gene family, member D (S. cerevisiae) NM_014822.1 Hs.19822 355 356 202983_at SMARCA3 SWI/SNF related, matrix associated, actin dependent AI760760 Hs.3068 regulator of chromatin, subfamily a, member 3 357 358 200652_at SSR2 signal sequence receptor, beta NM_003145.2 Hs.74564 (translocon-associated protein beta) 359 360 209032_s_at IGSF4 immunoglobulin superfamily, member 4 AF132811.1 Hs.70337 361 362 203845_at PCAF p300/CBP-associated factor AV727449 Hs.199061 363 364 209833_at CRADD CASP2 and RIPK1 domain containing adaptor U79115.1 Hs.155566 with death domain 95 96 214135_at CLDN18 claudin 18 BE551219 Hs.278966 365 366 201063_at RCN1 reticulocalbin 1, EF-hand calcium binding domain NM_002901.1 Hs.167791 367 368 205661_s_at PP591 hypothetical protein PP591 NM_025207.1 Hs.118666 369 370 218130_at MGC4368 hypothetical protein MGC4368 NM_024510.1 Hs.9732 371 372 212285_s_at AGRN agrin AF016903.1 Hs.273330 67 68 203883_s_at Rab11-FIP2 KIAA0941 protein BG249608 Hs.173656 373 374 218163_at MCT-1 MCT-1 protein NM_014060.1 Hs.102696 45 46 208881_x_at IDI1 isopentenyl-diphosphate delta isomerase BC005247.1 Hs.76038 375 376 200705_s_at EEF1B2 eukaryotic translation elongation factor 1 beta 2 NM_001959.1 Hs.275959 377 378 202386_s_at Homo sapiens KIAA0430 gene product (KIAA0430), mRNA NM_019081.1 Hs.30909 379 380 212143_s_at IGFBP3 insulin-like growth factor binding protein 3 NM_000598.1 Hs.77326 79 80 201997_s_at SHARP SMART/HDAC1 associated repressor protein NM_015001.1 Hs.184245 381 382 204396_s_at GPRK5 G protein-coupled receptor kinase 5 NM_005308.1 Hs.211569 383 384 200599_s_at TRA1 tumor rejection antigen (gp96) 1 NM_003299.1 Hs.82689 55 56 201432_at CAT catalase NM_001752.1 Hs.76359 65 66 214752_x_at FLNA filamin A, alpha (actin binding protein 280) AI625550 Hs.195464 53 54 219282_s_at TRPV2 transient receptor potential cation channel, NM_015930.1 Hs.279746 subfamily V, member 2 385 386 203802_x_at WBSCR20A Williams Beuren syndrome chromosome region 20A NM_018044.1 Hs.272820 387 388 202732_at PKIG protein kinase (cAMP-dependent catalytic) inhibitor gamma NM_007066.1 Hs.3407 389 390 202766_s_at FBN1 fibrillin 1 (Marfan syndrome) NM_000138.1 Hs.750 391 392 201091_s_at CBX3 chromobox homolog 3 (HP1 gamma homolog, Drosophila) BE748755 Hs.278554 393 394 201874_at FLJ21047 hypothetical protein FLJ21047 BF978611 Hs.14891 395 396 204867_at GCHFR GTP cyclohydrolase I feedback regulatory protein NM_005258.2 Hs.83081 397 398 206875_s_at SLK Ste20-related serine/threonine kinase NM_014720.1 Hs.105751 399 400 210645_s_at TTC3 tetratricopeptide repeat domain 3 D83077.1 Hs.118174 47 48 202310_s_at COL1A1 collagen, type I, alpha 1 NM_000088.1 Hs.172928 37 38 201535_at UBL3 ubiquitin-like 3 NM_007106.1 Hs.173091

TABLE 3 Lung Tumor Marker (Set Three) SEQ SEQ ID NO ID NO Gene Genbank Unigene (DNA) (Protein) Probe Set Symbol Title Accession Cluster 226 227 213693_s_at MUC1 mucin 1, transmembrane AI610869 Hs.89603 228 229 201973_s_at LOC51622 CGI-43 protein AL550875 Hs.289112 230 231 202948_at IL1R1 interleukin 1 receptor, type I NM_000877.1 Hs.82112 232 233 204619_s_at CSPG2 chondroitin sulfate proteoglycan 2 (versican) BF590263 Hs.81800 234 235 204017_at KDELR3 KDEL (Lys-Asp-Glu-Leu) endoplasmic reticulum protein NM_006855.2 Hs.250696 retention receptor 3 236 237 201231_s_at ENO1 enolase 1, (alpha) NM_001428.1 Hs.254105 238 239 200699_at ESTs, Weakly similar to A43932 mucin 2 precursor, NM_006854.2 Hs.380986 intestinal - human (fragments) [H. sapiens] 240 241 200594_x_at HNRPU heterogeneous nuclear ribonucleoprotein U NM_004501.1 Hs.103804 (scaffold attachment factor A) 242 243 218194_at DKFZP566E144 small fragment nuclease NM_015523.1 Hs.7527 244 245 212842_x_at RANBP2L1 RAN binding protein 2-like 1 AL043571 Hs.179825 246 247 216640_s_at P5 protein disulfide isomerase-related protein AK026926.1 Hs.182429 248 249 212041_at ATP6V0D1 ATPase, H+ transporting, lysosomal 38 kDa, V0 AL566172 Hs.106876 subunit d isoform 1 250 251 218167_at LOC51321 hypothetical protein LOC51321 NM_016627.1 Hs.268122 252 253 209274_s_at MGC4276 hypothetical protein MGC4276 similar to CG8198 BC002675.1 Hs.177776 254 255 208742_s_at SAP18 sin3-associated polypeptide, 18 kDa U78303.1 Hs.23964 256 257 205034_at CCNE2 cyclin E2 NM_004702.1 Hs.30464 258 259 218795_at ACP6 lysophosphatidic acid phosphatase NM_016361.1 Hs.15871 260 261 219382_at RBT1 RPA-binding trans-activator NM_013368.1 Hs.169138 262 263 221865_at FLJ38045 hypothetical protein FLJ38045 BF969986 Hs.170226 264 265 209345_s_at PI4KII phosphatidylinositol 4-kinase type II AL561930 Hs.25300 266 267 207543_s_at P4HA1 procollagen-proline, 2-oxoglutarate 4-dioxygenase NM_000917.1 Hs.76768 (proline 4-hydroxylase), alpha polypeptide I 268 269 201193_at IDH1 isocitrate dehydrogenase 1 (NADP+), soluble NM_005896.1 Hs.11223 270 271 216041_x_at GRN granulin AK023348.1 Hs.180577 272 273 208843_s_at GORASP2 golgi reassembly stacking protein 2, 55 kDa BC001408.1 Hs.6880 274 275 200749_at RAN RAN, member RAS oncogene family BF112006 Hs.10842 276 277 60471_at RIN3 Ras and Rab interactor 3 AA625133 Hs.180040 278 279 221485_at B4GALT5 UDP-Gal: betaGlcNAc beta 1,4-galactosyltransferase, NM_004776.1 Hs.107526 polypeptide 5 280 281 214002_at MYL6 myosin, light polypeptide 6, alkali, smooth AA419227 Hs.77385 muscle and non-muscle 282 283 218723_s_at RGC32 RGC32 protein NM_014059.1 Hs.76640 284 285 214669_x_at Homo sapiens isolate donor N clone N168K BG485135 Hs.306357 immunoglobulin kappa light chain variable region mRNA, partial cds 286 287 218218_at DIP13B DIP13 beta NM_018171.1 Hs.107882 288 289 217949_s_at IMAGE3455200 hypothetical protein IMAGE3455200 NM_024006.1 Hs.324844 290 291 203710_at ITPR1 inositol 1,4,5-triphosphate receptor, type 1 NM_002222.1 Hs.198443 292 293 220661_s_at FLJ20531 hypothetical protein FLJ20531 NM_017865.1 Hs.23617 294 295 203458_at SPR sepiapterin reductase (7,8-dihydrobiopterin: AI951454 Hs.301540 NADP+ oxidoreductase) 296 297 216207_x_at IGKC immunoglobulin kappa constant AW408194 Hs.156110 298 299 201377_at KIAA0144 KIAA0144 gene product NM_014847.1 Hs.8127 300 301 208073_x_at TTC3 tetratricopeptide repeat domain 3 NM_003316.1 Hs.118174 71 72 204063_s_at ULK2 unc-51-like kinase 2 (C. elegans) NM_014683.1 Hs.151406 302 303 212160_at ESTs, Highly similar to XPOT_HUMAN Exportin AI984005 Hs.380785 T (tRNA exportin) (Exportin(tRNA)) [H. sapiens] 304 305 218205_s_at MKNK2 MAP kinase-interacting serine/threonine kinase 2 NM_017572.1 Hs.261828 306 307 204781_s_at TNFRSF6 tumor necrosis factor receptor superfamily, member 6 NM_000043.1 Hs.82359 308 309 202404_s_at COL1A2 collagen, type I, alpha 2 NM_000089.1 Hs.179573 310 311 204615_x_at IDI1 isopentenyl-diphosphate delta isomerase NM_004508.1 Hs.76038 312 313 200068_s_at CANX calnexin M94859.1 Hs.155560 314 315 212595_s_at DAZAP2 DAZ associated protein 2 AL534321 Hs.75416 316 317 219458_s_at FLJ22609 hypothetical protein FLJ22609 NM_022072.1 Hs.18740 318 319 204787_at Z39IG Ig superfamily protein NM_007268.1 Hs.8904 320 321 202034_x_at RB1CC1 RB1-inducible coiled-coil 1 NM_014781.1 Hs.50421 322 323 201178_at FBXO7 F-box only protein 7 NM_012179.1 Hs.5912 324 325 214687_x_at ALDOA aldolase A, fructose-bisphosphate AK026577.1 Hs.273415 33 34 210299_s_at FHL1 four and a half LIM domains 1 AF063002.1 Hs.239069 326 327 202146_at IFRD1 interferon-related developmental regulator 1 AA747426 Hs.7879 328 329 218241_at GOLGA5 golgi autoantigen, golgin subfamily a, 5 NM_005113.1 Hs.241572 330 331 221004_s_at ITM3 integral membrane protein 3 NM_030926.1 Hs.111577 332 333 201160_s_at CSDA cold shock domain protein A AL556190 Hs.198726 334 335 211139_s_at NAB1 NGFI-A binding protein 1 (EGR1 binding protein 1) AF045452.1 Hs.107474 336 337 214211_at FTH1 ferritin, heavy polypeptide 1 AA083483 Hs.62954 93 94 214366_s_at ALOX5 arachidonate 5-lipoxygenase AA995910 Hs.89499 338 339 204271_s_at EDNRB endothelin receptor type B M74921.1 Hs.82002 3 4 200966_x_at ALDOA aldolase A, fructose-bisphosphate NM_000034.1 Hs.273415 340 341 200711_s_at TCEB1L transcription elongation factor B (SIII), polypeptide 1-like NM_003197.2 Hs.171626 342 343 218107_at FLJ21016 hypothetical protein FLJ21016 NM_025160.1 Hs.289069 344 345 212433_x_at RPS2 ribosomal protein S2 AA630314 Hs.356360 346 347 202953_at C1QB complement component 1, q subcomponent, beta polypeptide NM_000491.2 Hs.8986 348 349 219117_s_at FKBP11 FK506 binding protein 11, 19 kDa NM_016594.1 Hs.24048 350 351 221704_s_at FLJ12750 hypothetical protein FLJ12750 BC005882.1 Hs.77870 353 354 202375_at SEC24D SEC24 related gene family, member D (S. cerevisiae) NM_014822.1 Hs.19822 355 356 202983_at SMARCA3 SWI/SNF related, matrix associated, actin dependent AI760760 Hs.3068 regulator of chromatin, subfamily a, member 3 357 358 200652_at SSR2 signal sequence receptor, beta (translocon-associated NM_003145.2 Hs.74564 protein beta) 359 360 209032_s_at IGSF4 immunoglobulin superfamily, member 4 AF132811.1 Hs.70337 361 362 203845_at PCAF p300/CBP-associated factor AV727449 Hs.199061 363 364 209833_at CRADD CASP2 and RIPK1 domain containing adaptor U79115.1 Hs.155566 with death domain 95 96 214135_at CLDN18 claudin 18 BE551219 Hs.278966 365 366 201063_at RCN1 reticulocalbin 1, EF-hand calcium binding domain NM_002901.1 Hs.167791 367 368 205661_s_at PP591 hypothetical protein PP591 NM_025207.1 Hs.118666 369 370 218130_at MGC4368 hypothetical protein MGC4368 NM_024510.1 Hs.9732 371 372 212285_s_at AGRN agrin AF016903.1 Hs.273330 67 68 203883_s_at Rab11-FIP2 KIAA0941 protein BG249608 Hs.173656 373 374 218163_at MCT-1 MCT-1 protein NM_014060.1 Hs.102696 45 46 208881_x_at IDI1 isopentenyl-diphosphate delta isomerase BC005247.1 Hs.76038 375 376 200705_s_at EEF1B2 eukaryotic translation elongation factor 1 beta 2 NM_001959.1 Hs.275959 377 378 202386_s_at Homo sapiens KIAA0430 gene product (KIAA0430), mRNA NM_019081.1 Hs.30909 379 380 212143_s_at IGFBP3 insulin-like growth factor binding protein 3 NM_000598.1 Hs.77326 79 80 201997_s_at SHARP SMART/HDAC1 associated repressor protein NM_015001.1 Hs.184245 381 382 204396_s_at GPRK5 G protein-coupled receptor kinase 5 NM_005308.1 Hs.211569 383 384 200599_s_at TRA1 tumor rejection antigen (gp96) 1 NM_003299.1 Hs.82689 55 56 201432_at CAT catalase NM_001752.1 Hs.76359 65 66 214752_x_at FLNA filamin A, alpha (actin binding protein 280) AI625550 Hs.195464 53 54 219282_s_at TRPV2 transient receptor potential cation channel, NM_015930.1 Hs.279746 subfamily V, member 2 385 386 203802_x_at WBSCR20A Williams Beuren syndrome chromosome region 20A NM_018044.1 Hs.272820 387 388 202732_at PKIG protein kinase (cAMP-dependent, catalytic) inhibitor gamma NM_007066.1 Hs.3407 389 390 202766_s_at FBN1 fibrillin 1 (Marfan syndrome) NM_000138.1 Hs.750 391 392 201091_s_at CBX3 chromobox homolog 3 (HP1 gamma homolog, Drosophila) BE748755 Hs.278554 393 394 201874_at FLJ21047 hypothetical protein FLJ21047 BF978611 Hs.14891 395 396 204867_at GCHFR GTP cyclohydrolase I feedback regulatory protein NM_005258.2 Hs.83081 397 398 206875_s_at SLK Ste20-related serine/threonine kinase NM_014720.1 Hs.105751 399 400 210645_s_at TTC3 tetratricopeptide repeat domain 3 D83077.1 Hs.118174 47 48 202310_s_at COLIA1 collagen, type I, alpha 1 NM_000088.1 Hs.172928 37 38 201535_at UBL3 ubiquitin-like 3 NM_007106.1 Hs.173091 

1. A method for providing a patient diagnosis for lung cancer, comprising the steps of: (a) determining the level of expression of one or more genes or gene products in a first biological sample taken from the patient; (b) determining the level of expression of one or more genes or gene products in at least a second biological sample taken from a normal patient sample; and (c) comparing the level of expression of one or more genes or gene products in the first biological sample with the level of expression of one or more genes or gene products in the second biological sample; wherein a change in the level of expression of one or more genes or gene products in the first biological sample compared to the level of expression of one or more genes or gene products in the second biological sample is a diagnostic of the disease.
 2. The method of claim 1, wherein one or more genes are selected from the group consisting of SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, and
 399. 3. The method of claim 1, wherein one or more gene products are polypeptides selected from the group consisting of SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 369, 398, and
 400. 4. A method for distinguishing between normal and disease tissues, comprising the steps of: (a) determining the level of expression of one or more genes or gene products in a first biological sample of a disease tissue; (b) determining the level of expression of one or more genes or gene products in at least a second biological sample taken from normal tissue; and (c) comparing the level of expression of one or more genes or gene products in the first biological sample with the level of expression of one or more genes or gene products in the second biological sample; wherein a change in the level of expression of one or more genes or gene products in the first biological sample compared to the level of expression of one or more genes or gene products in the second biological sample is indicative of a disease state.
 5. The method of claim 4, wherein one or more genes are selected from the group consisting of SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47,49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94,96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, and
 399. 6. The method of claim 4, wherein one or more gene products are polypeptides selected from the group consisting of SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 369, 398, and
 400. 7. A method to monitor the response of a patient being treated for lung cancer by administering a anti-cancer agent, comprising the steps of: (a) determining the level of expression of one or more genes or gene products in a first biological sample taken from the patient prior to treatment with the anti-cancer agent; (b) determining the level of expression of one or more genes or gene products in at least a second biological sample taken from the patient subsequent to the treatment with the anti-cancer agent; and (c) comparing the level of expression of one or more one genes(s) or gene products in the second biological sample with the level of expression of one or more one genes(s) or gene products in the first biological sample; wherein a change in the level of expression of one or more genes or gene products in the second biological sample compared to the level of expression of one or more genes or gene products in the first biological sample indicates the efficacy of the treatment with the anti-cancer agent.
 8. The method of claim 7, wherein one or more genes are selected from the group consisting of SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94,96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, and
 399. 9. The method of claim 7, wherein one or more gene products are polypeptides selected from the group consisting of SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 369, 398, and
 400. 10. A method for identifying a compound useful for the treatment of lung cancer comprising the steps of: (a) analyzing the level of expression of one or more genes and/or gene products in a cell or tissue sample prior to treatment with the compound; (b) analyzing the level of expression of one or more genes and/or gene products in a cell or tissue sample subsequent to treatment with the compound; wherein a variation in the expression level of the gene and/or gene product is indicative of drug efficacy.
 11. The method of claim 10, wherein one or more genes are selected from the group consisting of SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, and
 399. 12. The method of claim 10, wherein one or more gene products are polypeptides selected from the group consisting of SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 369, 398, and
 400. 13. An array for distinguishing between normal and disease tissues, comprising two or more probes corresponding to two or more genes selected from the group consisting of SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, and
 399. 14. An array for distinguishing between normal and disease tissues, comprising two or more polypeptides selected from the group consisting of SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 369, 398, and
 400. 